Create a Stacked Area (Ribbon) Plot to Visualize Temporal Changes of Education Degree Distribution in U.S.

In this visualization, we’ll create a stacked ribbon plot to display the changing proportion of different education degrees earned in the U.S. from 1970 to 2020.

Packages and Data Cleanup

library(ggmosaic) # load the "happy" datasetlibrary(ggplot2)library(dplyr)theme_set(theme_classic(base_size = 17))
h <- happy %>% # count number of observations in each group group_by(year, degree) %>% summarise(n = n()) %>% # remove rows with NA values in "degree" filter(!is.na(degree)) %>% # calculate the yearly proportion of each education degree group_by(year) %>% mutate( fraction = n / sum(n), # reverse the default order of the "degree" variable # so that high school degrees reside the bottom in the plot # and higher education degrees reside the top in the plot degree = factor(degree, levels = rev(unique(degree))))
head(h, 30)

Output:

# A tibble: 30 × 4
# Groups: year [6]
year degree n fraction
<dbl> <fct> <int> <dbl>
1 1972 lt high school 635 0.399
2 1972 high school 762 0.479
3 1972 junior college 17 0.0107
4 1972 bachelor 124 0.0780
5 1972 graduate 52 0.0327
6 1973 lt high school 552 0.371
7 1973 high school 720 0.484
8 1973 junior college 20 0.0134
9 1973 bachelor 132 0.0887
10 1973 graduate 65 0.0437
# ℹ 20 more rows

Visualization

Create ribbons using geom_area(). Like geom_col() and geom_bar(), the areas (ribbons) take a default stack position, and therefore all the ribbons add up to a total proportion of 100%.

p1 <- h %>%   ggplot(aes(x = year, y = fraction, fill = degree)) +  geom_area(color = "black") +  scale_fill_brewer(palette = "Greens", direction = -1)  p1

Label with the education degree. This annotation serves as a more concise alternative to the legend. As the y-aesthetic is in fraction, both a stack of fill position in geom_text() would align the text with the ribbons.

p2 <- p1 +   geom_text(    data = h %>% filter(year == 2010),    aes(label = degree),    position = position_stack(vjust = .5),    size = 5, fontface = "bold") p2

Alternatively, the above code can be written in the following format to produce an identical graphical effect: here we map the absolute counts n to aesthetic y, and apply the fill position to the ribbons to normalize the total height to unit 1. In addition, we specify the same fill position to geom_text() to align the texts with the ribbons.

h %>%   # map absolute counts "n" to aesthetic y  ggplot(aes(x = year, y = n, fill = degree)) +  # use "fill" position for ribbons  geom_area(position = "fill", color = "black") +  scale_fill_brewer(palette = "Greens", direction = -1)  +    geom_text(    data = h %>% filter(year == 2010),    aes(label = degree),    # use "fill" position for texts    position = position_fill(vjust = .5)) 

A final touch up.

p3 <- p2 +   # revise y-axis labels and title  scale_y_continuous(    labels = function(x){paste(x * 100, "%")},    n.breaks = 6,     name = "Proportion") +    # remove the legend  theme(legend.position = "none") +  # expand to fill up the panel  coord_cartesian(expand = 0) 
p3

The above ribbons are generated with the generic ggplot2 function geom_area(). The extension package ggalluvial generates ribbons in very like manner, but with smoother ribbon outline. The plot can be created using the same code above, but with ggalluvial::geom_alluvium(aes(alluvium = degree)) in replace of geom_area(). (see another example below)

library(ggmosaic) # load the "happy" datasetlibrary(ggplot2)library(dplyr)theme_set(theme_classic(base_size = 17))
h <- happy %>% # count number of observations in each group group_by(year, degree) %>% summarise(n = n()) %>% # remove rows with NA values in "degree" filter(!is.na(degree)) %>% # calculate the yearly proportion of each education degree group_by(year) %>% mutate( fraction = n / sum(n), # reverse the default order of the "degree" variable # so that high school degrees reside the bottom in the plot # and higher education degrees reside the top in the plot degree = factor(degree, levels = rev(unique(degree))))
head(h, 30)

# Create a ribbon plotp1 <- h %>% ggplot(aes(x = year, y = fraction, fill = degree)) + geom_area(color = "black") + scale_fill_brewer(palette = "Greens", direction = -1) p1

# Label with the education degree. p2 <- p1 + geom_text( data = h %>% filter(year == 2010), aes(label = degree), position = position_stack(vjust = .5), size = 5, fontface = "bold") p2

# A final touch up. p3 <- p2 + # revise y-axis labels and title scale_y_continuous( labels = function(x){paste(x * 100, "%")}, n.breaks = 6, name = "Proportion") + # remove the legend theme(legend.position = "none") + # expand to fill up the panel coord_cartesian(expand = 0)
p3




Continue Exploring — 🚀 one level up!


As mentioned above, ggalluvialcreates area plots with mathematically smoothed ribbon outlines, and is a visually appealing alternative to the generic geom_area() function. The following stacked ribbon / alluvium plot by this package shows dynamic shifts in the migrant population to the United States from 1820 to 2009.