Create 2D Histogram in ggplot2 to Visualize Population Distribution in Happiness and Wealth in U.S.

In this article, we’ll visualize the population composition at different levels of happiness and financial status in the United States. Major techniques used in this article include:

  • Create a 2D histogram plot for categorical variables. (check here for continuous variable)
  • Use stat and after_stat arguments for impromptu statistical computation and enhanced visualization flexibility and efficiency.

Packages and data

Here we use the happy dataset from the ggmosaic package. To aid in visualization, here we remove the rows containing NA values in variables happy and finrela (financial status), and select data only of the year of 2018.

library(ggplot2)library(dplyr)library(stringr) # for string manipulation
# devtools::install_github("haleyjeppson/ggmosaic")library(ggmosaic)
h <- happy %>% filter(! c(is.na(happy) | is.na(finrela) ), year == 2018) %>% as_tibble()
head(h, 3)

Output:

# A tibble: 3 × 12
year age degree finrela happy health marital sex polviews partyid wtssall
<dbl> <dbl> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <dbl>
1 2018 43 junio… below … pret… good never … male conserv… not st… 198
2 2018 74 high … below … very… excel… separa… fema… <NA> ind,ne… 61
3 2018 42 bache… above … very… <NA> married male slghtly… ind,ne… 61
# ℹ 1 more variable: nhappy <dbl>

Visualization

Create a 2D histogram using geom_bin2d. The fill aesthetic (not typed out here) of each rectangle defaults to be mapped to the headcount value of each combination level of happy and finrela. The color and linewidth specifies the outline of the rectangles.

p1 <- h %>%  ggplot(aes(x = happy, y = finrela)) +  geom_bin2d(color = "white", linewidth = 1) 
p1

Label each rectangle with the associated proportion of the headcount relative to the total population. These proportion values are not directly available in the input dataset h. Instead, we’ll calculate these values on the fly using the stat and after_stat technique.

  • stat: statistical transformation. It specifies where the x and y aesthetics come from. In geom_text, the default is stat = "identity", meaning that both x and y aesthetics are mapped with variables directly available in the input dataset. Alternatively, the x and y may be otherwise derived via statistical transformation performed impromptu by ggplot2. In this example, the x and y are based on bins, i.e., unique categorical values, as specified by stat = "bin2d".

  • after_stat: we use count / sum(count) to calculate the proportion, and pass the computed values to after_stat to be accessible to the label aesthetic mapping.

More details of ggplot2 essentials can be found in this highly recommended e-book created by Harvard scientist.

p2 <- p1 +     geom_text(    stat = "bin2d",     aes(label = after_stat(count / sum(count) * 100) %>%           round(1) %>% paste("%") ),    color = "white", fontface = "bold", size = 4) p2

Wrap y-axis labels into two rows. Here we leverage the str_wrap function from the stringr package (popular for string manipulation).

p3 <- p2 +   scale_y_discrete(labels = function(x) str_wrap(x, width = 10)) p3

Update the color (fill) scale using the viridis color palettes.

p4 <- p3 +  scale_fill_viridis_c(option = "F", direction = -1) p4

Load font from the Google Font Repository, and set “Amita” as the default font.

# Add Google fontlibrary(showtext)font_add_google(name = "Amita", family = "Amita")showtext_auto()

Polish up the plot: adding axial and plot titles, and customize the theme.

p5 <- p4 +   # add axial and plot title  labs(    x = "Happiness", y = "Financial status",    title = "Population Distribution in Happiness and Wealth",     subtitle = "General social survery of Americans' happiness in 2018") +    # theme  theme_classic(base_size = 13, base_family = "Amita") + # use Amita as the default font  theme(    plot.title = element_text(hjust = .5, face = "bold"),    plot.subtitle = element_text(hjust = .5, color = "snow4"),    axis.line = element_blank(),    axis.title = element_text(face = "bold"),    axis.title.x = element_text(margin = margin(t = 10)),    legend.position = "none") 
p5

Save the plot to the folder “graphics” (which is in the same folder as the source code).

ggsave(filename = "Happiness and Wealth 2D Histogram.pdf",       path = "graphics", # a relative path       width = 6, height = 6)
library(ggplot2)library(dplyr)library(stringr) # for string manipulation
# devtools::install_github("haleyjeppson/ggmosaic")library(ggmosaic)
h <- happy %>% filter(! c(is.na(happy) | is.na(finrela) ), year == 2018)head(h, 3)
# Create a 2D histogramp1 <- h %>% ggplot(aes(happy, finrela)) + geom_bin2d(color = "white", linewidth = 1)
p1
# Label each rectangle with the headcount proportion relative to the total population. p2 <- p1 + geom_text( stat = "bin2d", aes(label = after_stat(count / sum(count) * 100) %>% round(1) %>% paste("%") ), color = "white", fontface = "bold", size = 4) p2
# Wrap y-axis labels into two rows. p3 <- p2 + scale_y_discrete(labels = function(x) str_wrap(x, width = 10)) p3
# Update the filled color scalep4 <- p3 + scale_fill_viridis_c(option = "F", direction = -1) p4
# Load font from Google Font Repository# Add Google fontlibrary(showtext)font_add_google(name = "Amita", family = "Amita")showtext_auto()
# Add axial and plot titles, and customize the theme. p5 <- p4 + # add axial and plot title labs( x = "Happiness", y = "Financial status", title = "Population Distribution in Happiness and Wealth", subtitle = "General social survery of Americans' happiness in 2018") + # theme theme_classic(base_size = 13, base_family = "Amita") + # use Amita as the default font theme( plot.title = element_text(hjust = .5, face = "bold"), plot.subtitle = element_text(hjust = .5, color = "snow4"), axis.line = element_blank(), axis.title = element_text(face = "bold"), axis.title.x = element_text(margin = margin(t = 10)), legend.position = "none")
p5
# Save the plot to the folder "graphics" # folder "graphics" is in the same folder as the source code. ggsave(filename = "Happiness and Wealth 2D Histogram.pdf", path = "graphics", # a relative path width = 6, height = 6)




Continue Exploring — 🚀 one level up!


geom_bin2d was employed above to visualize counts for categorical variables. In addition, it is more frequently applied to visualize continuous variables – check this awesome 2D histogram with a map overlay, which visualizes the hurricane activities in North Atlantic Ocean.



The above visualization on happiness and wealth employs colors and texts to depict the population distribution. As an alternative approach, we can leverage the mosaic plot using rectangles to illustrate the people composition in a visually more intuitive way.