Use Stacked and Faceted Bar Plots in ggplot2 to Visualize U.S. Mortality

In this article, we’ll visualize the population mortality resulted from different diseases in the United States. We’ll use the USMortality dataset from the lattice package.

Major techniques covered in this tutorial include:

  • Create stacked bar plots with text annotation.
  • Divide a plot into faceted panels (subplots).
  • Annotate selected faceted panels.

Packages and data cleanup

Wrap long strings of disease names into multiple lines. This helps to better utilize the available plot space. This is achieved using str_wrap from the stringr package.

library(ggplot2)library(dplyr)library(lattice)library(stringr)library(forcats)
u <- USMortality %>% # divide long names into single words in multiple lines to ease visualization mutate(Cause = str_wrap(Cause, width = 7)) # max of 7 characters per line

Create an ordered bar plot by the disease occurrence rate. To do this, here we convert the Cause variable to a factor, and specify the level order by the sum of occurrence rate in an descending order. This is done using fct_reorder from the forcats package. (check this complete guide on ordering graphic elements in ggplot2)

u <- u %>%   # convert "Cause" into a factor, reordered by the sum of occurrence "Rate"   mutate(Cause = fct_reorder(Cause, Rate, .fun = sum, .desc = T)) %>%   as_tibble()
head(u, 3)

Output:

# A tibble: 3 × 5
Status Sex Cause Rate SE
<fct> <fct> <fct> <dbl> <dbl>
1 Urban Male "Heart\ndisease" 210. 0.2
2 Rural Male "Heart\ndisease" 243. 0.6
3 Urban Female "Heart\ndisease" 132. 0.2

Visualization

Create a bar plot, displaying the disease occurrence rate in different Status and Sex. The plot is divided into faceted panels (subplots) by the Cause variable. Here we use facet_grid(Cause ~ .) instead of facet_wrap(~Cause, ncol = 1), so that the facet titles can be switched to the left side of the bars. We’ll fix the graphic crowdedness in the following steps.

theme_set(theme_classic(base_size = 15))
p1 <- u %>% ggplot(aes(x = Sex, y = Rate, fill = Sex, color = Sex, alpha = Status)) + geom_col(position = "stack", linewidth = .5) + # create a facet panel for each cause of disease facet_grid(Cause ~ ., scales = "free_y", # relocate the facet title to the left switch = "y")p1

Flip the plot. Note that after flip, the aesthetic mapping remains unchanged (e.g., Sex remains the x aesthetic, and Rate remains the y). In contrast, in the theme() syntax, the horizontal axis is always treated as the x-axis, and vertical axis as the y-axis, regardless of the flip.

p2 <- p1 +   coord_flip(    expand = 0, # expand the plot, and remove margins around the plot     clip = F) #  ensure the title added later is displayed completelyp2

Annotate the bars with the disease occurrence rate.

  • Use position in stack to synchronize texts with bars in position, and use vjust = .5 to centralize the texts.

  • Colors are applied first to texts in one subplot and in the barplot of the male (white - steelblue4), and then to the barplot of the female (white - red4). This pattern is then repeatedly 10 times across the 10 faceted panels (rep(10)).

p3 <- p2 +   geom_text(    aes(label = round(Rate)),     position = position_stack(vjust = .5), alpha = .6,    color = c("white", "steelblue4", "white", "red4") %>% rep(10),    fontface = "bold") p3

Add plot title at the bottom right corner of the plot.

Method 1: Use ggtitle() to add plot title. Use theme to specify the title’s position, with vjust to adjust the text’s relative vertical position, and hjust to adjust the horizontal position. This approach, however, takes trial and error to find a suitable vjust value. In addition, a small change in the size of the text and the dimension of the display device can lead to drastic change of the title position. So the vjust approach for plot title is not a very robust one.

p3 +   ggtitle("U.S. mortality rates, 2011-2013\ndeaths per 100,000 population") +  theme(plot.title = element_text(    vjust = -100, # this parameter value can vary a lot!    hjust = 1, size = 12,     face = "bold"))

Method 2: Add title using geom_text(). This is a more robust approach. As it involves adding texts only to selected subplots, the syntax is a bit more complicated.

  • For faceted panels, a regular text mapping adds texts to all panels. To add texts uniquely to selected panels, we need to create a dataset (e.g., myTitle) to specify the text content and position, including the faceting variable (i.e., Cause) to explicitly indicate the name of the panel (e.g., “Nephritis”) where the texts should be added.

  • Also importantly, we keep the faceting variable Cause defined in myTitle as a factor to reserve the bar order in the plot. If Cause is a character variable, it’ll rearrange the bars (subplots) in the default alphabetical order.

  • Note that we had set clip = F in coord_flip() at the earlier step of making p2. This is critical for complete texts display when the texts cross the border of the faceted panels.

myTitle <- tibble(  X = "Male",   Y = 450,   text = "U.S. mortality rates, 2011-2013\ndeaths per 100,000 population",   # add text to the "Nephritis" panel  Cause = factor("Nephritis"))
p4 <- p3 + geom_text( data = myTitle, aes(X, Y, label = text), inherit.aes = F, hjust = 1, # align to the right boundary of the panel fontface = "bold", size = 5)
p4

A final polish-up. Major edits include the removal of default x-axis (left) labels (and this frees up more space for the plot title), reorient the subplot titles, and add a black outline to the legend keys of the alpha aesthetic to make them visually more prominent. It’s worth noting that to customize the subplot titles, strip.text.y does not work; it must have the suffix .left to be effective.

p5 <- p4 +   # color scale  scale_fill_brewer(palette = "Set1")  +  scale_color_brewer(palette = "Set1")  +    theme(    # show facet titles completely    strip.clip = "off",        # NOTE!! "strip.text.y" does not work; must add suffix ".left"    strip.text.y.left = element_text(angle = 0, hjust = 1, face = "bold"),     strip.background = element_rect(fill = "cornsilk", color = NA),        # remove axial elements    axis.text = element_blank(),    axis.title = element_blank(),    axis.ticks = element_blank(),    axis.line = element_blank(),    legend.position = c(.9, .5),    legend.title = element_blank()) +    # add outline to alpha's legend   guides(alpha = guide_legend(override.aes = list( color = 'black')))
p5
library(ggplot2)library(dplyr)library(lattice)library(stringr)library(forcats)
theme_set(theme_classic(base_size = 15))
# data cleanupu <- USMortality %>% # divide long names into single words in multiple lines to ease visualization # max of 7 characters per line mutate(Cause = str_wrap(Cause, width = 7)) %>% # convert "Cause" into a factor, reordered by the sum of occurrence "Rate" mutate(Cause = fct_reorder(Cause, Rate, .fun = sum, .desc = T)) %>% as_tibble()
head(u, 3)

# Create a base bar plot, # displaying the disease occurrence rate in different `Status` and `Sex`. p1 <- u %>% ggplot(aes(x = Sex, y = Rate, fill = Sex, color = Sex, alpha = Status)) + geom_col(position = "stack", linewidth = .5) + # create a facet panel for each cause of disease facet_grid(Cause ~ ., scales = "free_y", # relocate the facet title to the left switch = "y")p1

# Flip the plot p2 <- p1 + coord_flip( expand = 0, # expand the plot, and remove margins around the plot clip = F) # ensure the title added later is displayed completelyp2

# Annotate the bars with the disease occurrence rate.p3 <- p2 + geom_text( aes(label = round(Rate)), position = position_stack(vjust = .5), alpha = .6, color = c("white", "steelblue4", "white", "red4") %>% rep(10), fontface = "bold") p3

# Add plot title at the bottom right corner of the plot. # Method 1 # (!! the title position may be VERY OFF depending on display device status)p3 + ggtitle("US mortality rates, 2011-2013\ndeaths per 100,000 population") + theme(plot.title = element_text( vjust = -110, # varies extremely in different display size hjust = 1, size = 12, face = "bold"))
# Method 2myTitle <- tibble( X = "Male", Y = 450, text = "US mortality rates, 2011-2013\ndeaths per 100,000 population", # add text to the "Nephritis" panel Cause = factor("Nephritis"))
p4 <- p3 + geom_text( data = myTitle, aes(X, Y, label = text), inherit.aes = F, hjust = 1, # align to the right boundary of the panel fontface = "bold", size = 5)
p4

# final polish-up.p5 <- p4 + # color scale scale_fill_brewer(palette = "Set1") + scale_color_brewer(palette = "Set1") + theme( # show facet titles completely strip.clip = "off", # NOTE!! "strip.text.y" does not work; must add suffix ".left" strip.text.y.left = element_text(angle = 0, hjust = 1, face = "bold"), strip.background = element_rect(fill = "cornsilk", color = NA), # remove axial elements axis.text = element_blank(), axis.title = element_blank(), axis.ticks = element_blank(), axis.line = element_blank(), legend.position = c(.9, .5), legend.title = element_blank()) + # add outline to alpha's legend guides(alpha = guide_legend(override.aes = list( color = 'black')))
p5




Continue Exploring — 🚀 one level up!


When handling long axis labels, wrapping them into multiple lines is an effective solution as demonstrated above. A powerful alternative solution is to display them as horizontal titles of faceted subplots, as illustrated below.



Barplots are the building blocks of the population pyramid, a frequently used tool in demographics to illustrate the age and gender distribution of a population.

And check this awesome article to render the pyramids to animation!