library(ggplot2)library(dplyr)library(tidyr)library(stringr)# set up global themetheme_set(theme_classic(base_size = 15))
Create Ordered and Stacked Bar Plots in ggplot2 to Visualize U.S. Midwest Demographics

In this article, we’ll create a bar plot to visualize demographic data of the U.S. midwest counties in 2000, highlighting the relationship between poverty rate and education level. Major techniques covered in this work include:
- Reorder the bar plot.
- Create bar plots in a “symmetrical” layout.
- Generate regression lines on stacked bar plot with synchronized position.
- Customize the theme.
Packages and data cleanup
We’ll use the midwest
dataset that is built in the base R.
Reorder the barplot. Arrange counties (PID
) in order of the percentage of the population living under the poverty line (percbelowpoverty
), and convert PID
into a factor to “memorise” this order. This ensures that the counties will be visualized in such order when rendered in the graphics. (Check this complete guide on how to reorder graphic elements in ggplot2)
<- midwest %>% m.reordered arrange(percbelowpoverty) %>% mutate(PID = factor(PID, levels = PID))
Tidy up the dataset. Select four variables to be plotted: the county ID (PID
), percentage of population with high school degree (perchsd
) and college degree (percollege
), and the percentage living in poverty (percbelowpoverty
). Transform the dataset into a tidy structure, such that the three democratic variables names are in one column education_poverty
, and the associated proportion are in another column percent
.
<- m.reordered %>% m.tidy select(PID, perchsd, percollege, percbelowpoverty) %>% # tidy up pivot_longer( -PID, names_to = "education_poverty", values_to = "percent")
head(m.tidy, n = 3)
Output:
# A tibble: 3 × 3
PID education_poverty percent
<fct> <chr> <dbl>
1 3026 perchsd 86.9
2 3026 percollege 37.4
3 3026 percbelowpoverty 2.18
Create bars in “symmetrical” layout. Turn the poverty data to negative values. This way, the poverty data will be displayed on the negative range of the y-axis, while the education data on the positive range of the y-axis (as illustrated in p1
). The same technique is also used to create population pyramids.
# turn poverty data to negative values<- m.tidy %>% m.signed mutate(percent = ifelse( == "percbelowpoverty", -percent, percent)) education_poverty
head(m.signed, 3) # ready for visualization
Output:
# A tibble: 3 × 3
PID education_poverty percent
<fct> <chr> <dbl>
1 3026 perchsd 86.9
2 3026 percollege 37.4
3 3026 percbelowpoverty -2.18
Visualization
For each county (PID
), create a bar to display the population percent living below the poverty line in the negative range of the y-axis, and percent with high school or college education in the positive range of y-axis. (Due to the large number of counties, the bars are squeezed into line segments.)
<- m.signed %>% p1 ggplot(aes(x = PID, y = percent, fill = education_poverty)) + geom_col(alpha = .6) + coord_flip(expand = 0) # flip the plot p1
Add a central vertical line at y = 0 (note that the coordinate is flipped), separating the data associated with the poverty and education. Relabel both halves of the y-axis with positive numbers.
<- p1 + p2 geom_hline(yintercept = 0, linewidth = 1.5) + scale_y_continuous( breaks = seq(-40, 120, 20), labels = function(x){ifelse(x < 0, -x, x)}) p2
Add simple linear regression lines to outline the changing trend of education across counties that have been arranged in descending order of poverty.
- use the
group
aesthetic to specify the subset of data that should be regressed. - use position in
"stack"
to synchronize the regression line with the bars - use the manual color & fill scale to keep the bars and lines consistent in color
<- p2+ p3 geom_smooth( # use a data subset NOT containing the poverty data data = filter(m.signed, education_poverty != "percbelowpoverty"), aes(group = education_poverty, color = education_poverty), method = "lm", # linear model se = F, # not show confidence ribbon position = "stack", # align with bars in position linetype = "dashed", linewidth = 1) + # make the color of bars and regression lines consistent scale_color_manual(values = c("steelblue2", "tomato")) + scale_fill_manual(values = c("snow3", "steelblue1", "tomato")) p3
Label the bars with the categories (poverty and education degree) they are associated with.
<- tibble( a Y = c(-43, 25, 105), X = rep(100, 3), status = c("living\nin poverty", "college\ndegree", "high\nschool\ndegree"))
<- p3 + geom_text( p4 data = a, aes(x = X, y = Y, label = status), inherit.aes = F, hjust = 0, fontface = "bold", size = 5, color = c("snow4", "tomato", "steelblue4"))
p4
Add axial and plot titles, and fine tune the theme.
<- p4 + p5 labs( x = "Counties", y = "Percent of population in the county", title = "U.S. midwest demographics in 2000", subtitle = "Better education is strongly associated with the decrease of poverty") + theme( legend.position = "none", axis.ticks = element_blank(), axis.text.y = element_blank(), axis.line.y = element_blank(), axis.title.y = element_text(hjust = 1), axis.title.x = element_text(margin = margin(t = 15)), axis.line.x = element_line(linewidth = 1), plot.title = element_text(hjust = 1, face = "bold"), plot.subtitle = element_text(hjust = 1, face = "italic", margin = margin(b = 10)) ) p5
Save the plot. By default, the most recently displayed plot will be saved. Here we save the plot to the “graphics” folder, which is in the same folder as the source code.
ggsave(filename = "bars education vs poverty.pdf",path = "graphics", # a relative path width = 7, height = 5)
library(ggplot2)library(dplyr)library(tidyr)library(stringr)theme_set(theme_classic(base_size = 15))
# Arrange counties (`PID`) in order of poverty percent.<- midwest %>% m.reordered arrange(percbelowpoverty) %>% mutate(PID = factor(PID, levels = PID))
# Select useful variables, and convert data to tidy structure. <- m.reordered %>% m.tidy select(PID, perchsd, percollege, percbelowpoverty) %>% # tidy up pivot_longer( -PID, names_to = "education_poverty", values_to = "percent")
head(m.tidy, n = 3)
# turn the poverty percent to negative values.<- m.tidy %>% mutate( m.signed percent = ifelse(education_poverty == "percbelowpoverty", -percent, percent))
head(m.signed, 3) # ready for visualization
### Visualization
# Create a bar plot.<- m.signed %>% p1 ggplot(aes(x = PID, y = percent, fill = education_poverty)) + geom_col(alpha = .6) + coord_flip(expand = 0) # flip the plot p1
# Add a central vertical line <- p1 + p2 geom_hline(yintercept = 0, linewidth = 1.5) + scale_y_continuous( breaks = seq(-40, 120, 20), labels = function(x){ifelse(x < 0, -x, x)}) p2
# Add linear regression to outline the changing trend of education.<- p2+ p3 geom_smooth( # use a data subset NOT containing the poverty data data = filter(m.signed, education_poverty != "percbelowpoverty"), aes(group = education_poverty, color = education_poverty), method = "lm", # linear model se = F, # not show confidence ribbon position = "stack", # align with bars in position linetype = "dashed", linewidth = 1) + # make the color of bars and regression lines consistent scale_color_manual(values = c("steelblue2", "tomato")) + scale_fill_manual(values = c("snow3", "steelblue1", "tomato")) p3
# Label the bars with the categories they are associated with. <- tibble( a Y = c(-43, 25, 105), X = rep(100, 3), status = c("living\nin poverty", "college\ndegree", "high\nschool\ndegree"))
<- p3 + geom_text( p4 data = a, aes(x = X, y = Y, label = status), inherit.aes = F, hjust = 0, fontface = "bold", size = 5, color = c("snow4", "tomato", "steelblue4"))
p4
# Add axial and plot titles, and fine tune the theme. <- p4 + p5 labs( x = "Counties", y = "Percent of population in the county", title = "US midwest demographics in 2000", subtitle = "Better education is strongly associated with the decrease of poverty") + theme( legend.position = "none", axis.ticks = element_blank(), axis.text.y = element_blank(), axis.line.y = element_blank(), axis.title.y = element_text(hjust = 1), axis.title.x = element_text(margin = margin(t = 15)), axis.line.x = element_line(linewidth = 1), plot.title = element_text(hjust = 1, face = "bold"), plot.subtitle = element_text(hjust = 1, face = "italic", margin = margin(b = 10)) ) p5
# Save the plot. ggsave(filename = "bars education vs poverty.pdf",path = "graphics", # a relative path width = 7, height = 5)
Continue Exploring — 🚀 one level up!
In the following plot, we employ annotated lines and points to highlight the significant changes in the human life span and population size from 1800 to 2015.
Check out this awesome stacked ribbon / alluvium plot, which shows dynamic shifts in the migrant population to the United States from 1820 to 2009.