Create a Stacked Area (Ribbon) Plot to Visualize Temporal Changes of Education Degree Distribution in U.S.
In this visualization, we’ll create a stacked ribbon plot to display the changing proportion of different education degrees earned in the U.S. from 1970 to 2020.
library(ggmosaic) # load the "happy" datasetlibrary(ggplot2)library(dplyr)theme_set(theme_classic(base_size =17)) h <- happy %>%# count number of observations in each groupgroup_by(year, degree) %>%summarise(n =n()) %>%# remove rows with NA values in "degree"filter(!is.na(degree)) %>%# calculate the yearly proportion of each education degreegroup_by(year) %>%mutate(fraction = n /sum(n),# reverse the default order of the "degree" variable# so that high school degrees reside the bottom in the plot# and higher education degrees reside the top in the plotdegree =factor(degree, levels =rev(unique(degree)))) head(h, 30)
Output:
# A tibble: 30 × 4 # Groups: year [6] year degree n fraction <dbl> <fct> <int> <dbl> 1 1972 lt high school 635 0.399 2 1972 high school 762 0.479 3 1972 junior college 17 0.0107 4 1972 bachelor 124 0.0780 5 1972 graduate 52 0.0327 6 1973 lt high school 552 0.371 7 1973 high school 720 0.484 8 1973 junior college 20 0.0134 9 1973 bachelor 132 0.0887 10 1973 graduate 65 0.0437 # ℹ 20 more rows
Visualization
Create ribbons using geom_area(). Like geom_col() and geom_bar(), the areas (ribbons) take a default stack position, and therefore all the ribbons add up to a total proportion of 100%.
p1 <- h %>%ggplot(aes(x = year, y = fraction, fill = degree)) +geom_area(color ="black") +scale_fill_brewer(palette ="Greens", direction =-1) p1
Label with the education degree. This annotation serves as a more concise alternative to the legend. As the y-aesthetic is in fraction, both a stack of fill position in geom_text() would align the text with the ribbons.
Alternatively, the above code can be written in the following format to produce an identical graphical effect: here we map the absolute counts n to aesthetic y, and apply the fill position to the ribbons to normalize the total height to unit 1. In addition, we specify the same fill position to geom_text() to align the texts with the ribbons.
h %>%# map absolute counts "n" to aesthetic yggplot(aes(x = year, y = n, fill = degree)) +# use "fill" position for ribbonsgeom_area(position ="fill", color ="black") +scale_fill_brewer(palette ="Greens", direction =-1) +geom_text(data = h %>%filter(year ==2010),aes(label = degree),# use "fill" position for textsposition =position_fill(vjust = .5))
A final touch up.
p3 <- p2 +# revise y-axis labels and titlescale_y_continuous(labels =function(x){paste(x *100, "%")},n.breaks =6, name ="Proportion") +# remove the legendtheme(legend.position ="none") +# expand to fill up the panelcoord_cartesian(expand =0) p3
The above ribbons are generated with the generic ggplot2 function geom_area(). The extension package ggalluvial generates ribbons in very like manner, but with smoother ribbon outline. The plot can be created using the same code above, but with ggalluvial::geom_alluvium(aes(alluvium = degree)) in replace of geom_area(). (see another example below)
library(ggmosaic) # load the "happy" datasetlibrary(ggplot2)library(dplyr)theme_set(theme_classic(base_size =17)) h <- happy %>%# count number of observations in each groupgroup_by(year, degree) %>%summarise(n =n()) %>%# remove rows with NA values in "degree"filter(!is.na(degree)) %>%# calculate the yearly proportion of each education degreegroup_by(year) %>%mutate(fraction = n /sum(n),# reverse the default order of the "degree" variable# so that high school degrees reside the bottom in the plot# and higher education degrees reside the top in the plotdegree =factor(degree, levels =rev(unique(degree)))) head(h, 30) # Create a ribbon plotp1 <- h %>%ggplot(aes(x = year, y = fraction, fill = degree)) +geom_area(color ="black") +scale_fill_brewer(palette ="Greens", direction =-1) p1 # Label with the education degree. p2 <- p1 +geom_text(data = h %>%filter(year ==2010),aes(label = degree),position =position_stack(vjust = .5),size =5, fontface ="bold") p2 # A final touch up. p3 <- p2 +# revise y-axis labels and titlescale_y_continuous(labels =function(x){paste(x *100, "%")},n.breaks =6, name ="Proportion") +# remove the legendtheme(legend.position ="none") +# expand to fill up the panelcoord_cartesian(expand =0) p3
Continue Exploring — 🚀 one level up!
As mentioned above, ggalluvialcreates area plots with mathematically smoothed ribbon outlines, and is a visually appealing alternative to the generic geom_area() function. The following stacked ribbon / alluvium plot by this package shows dynamic shifts in the migrant population to the United States from 1820 to 2009.