Use Stacked Area / Alluvial Plot in ggplot2 to Visualize Migration to U.S.
In this tutorial, we’ll create an annotated and stacked area / alluvial plot to visualize the changing population of migration to US from 1820 to 2009. The plot is a ggplot2 reproduction of the demo graphic made by Datawrapper by Mirko Lorenz.
Major techniques covered in this tutorial include:
Create stacked area / alluvial plot in specified order.
library(ggplot2)library(dplyr)library(tidyr)library(ggalluvial)library(RColorBrewer)theme_set(theme_classic(base_size =14)) d <-read.csv("/Users/boyuan/Desktop/R/gallery/DATASETS/migration_to_US.csv") d.tidy <- d %>%as_tibble() %>%pivot_longer(-year, names_to ="countries", values_to ="population")head(d.tidy, 3)
Output:
# A tibble: 3 × 3 year countries population <int> <chr> <int> 1 1825 Austria.Hungary 0 2 1825 Germany 5753 3 1825 Ireland 51617
Create a vector of color names, with each color for each country. Here we use different shades of blues to represent European countries, distinct yellows for Asian countries, and varied reds for American countries, and so on. Transition of colors in different hues signifies the transition between continents. (more techniques about the brewer color palette used here can be found in this complete guide)
# colors for 8 European countriesblues <-brewer.pal(8, "Blues") # colors for 4 Asian countries yellows <-c("cornsilk2", "yellow2", "gold", "orange") # colors for 6 American countries reds <-brewer.pal(6, "Reds") # colors for 3 other countriesgreys <-c("grey80", "grey50", "grey20") # combine all colorscolors.countries <-c(blues, yellows, reds, greys)
Display the color hex codes.
# preview the colorscolors.countries %>% scales::show_col(cex_label = .7)
The countries in the dataset are already sorted by continent. Now we convert the countries variable into a factor to “memorize” this order (though in reverse direction). This allows countries from the same continent to be grouped together in the visualization (learn more from this complete serial guides on graphic elements reordering). The reversal of the factor level (by rev) leads to great color contrast (from dark to light colors) when transitioning from countries of one continent to another continent (see plot p2).
Create an area / alluvium plot using geom_alluvium from the ggalluvial package. In this work, each color ribbon represents a country, and the ribbon width represents the number of migrants from that country to U.S. The geom_alluvium can be viewed as a smoothed version of geom_area(aes(group = countries)).
p0 <- d.tidy.ordered %>%ggplot(aes(x = year, y = population, fill = countries)) +theme(legend.position ="none") p0 +geom_alluvium(aes(alluvium = countries), alpha =1)
Here we take one step back - before the addition of alluvium ribbons, we instead first lay a bottom layer of two rectangles to mark the years of the World Wars, and then lay the alluvium ribbons on top of it.
p3 <- p2 +# World Warsannotate(geom ="text", x =c(1914, 1939), y =c(9*10^6, 4*10^6), label =c("World\nWar I", "World\nWar II"),hjust =0) +# continentsannotate(geom ="text", x =c(1895, 1990, 1995), y =c(1, 2, 6) *10^6, label =c("EUROPE", "ASIA", "AME-\nRICA"), fontface ="bold", size =c(8, 8, 5)) p3
Revise the axes.
p4 <- p3 +# expand to fill up the plot rangecoord_cartesian(expand =0) +scale_x_continuous(breaks =seq(1820, 2010, 20)) +scale_y_continuous(breaks =seq(0, 10*10^6, 2*10^6),labels =function(x){paste(x/10^6, "M")},position ="right", name =NULL) p4
A final touch-up of the titles and theme.
p5 <- p4 +# plot titleslabs(title ="Migration to the US by world regions, 1820-2009",subtitle ="Ribbon width represents number of migrants per decade from a country / region") +theme(plot.title =element_text(face ="bold",size =16),plot.subtitle =element_text(size =11, color ="grey30"),panel.grid.major.y =element_line()) p5
library(ggplot2)library(dplyr)library(tidyr)library(ggalluvial)library(RColorBrewer)library(scales)theme_set(theme_classic(base_size =14)) d <-read.csv("/Users/boyuan/Desktop/R/gallery/DATASETS/migration_to_US.csv") d.tidy <- d %>%as_tibble() %>%pivot_longer(-year, names_to ="countries", values_to ="population")head(d.tidy, 3) # Create a vector of color names -------# colors for 8 European countriesblues <-brewer.pal(8, "Blues") # colors for 4 Asian countries yellows <-c("cornsilk2", "yellow2", "gold", "orange") # colors for 6 American countries reds <-brewer.pal(6, "Reds") # colors for 3 other countriesgreys <-c("grey80", "grey50", "grey20") # combine all colorscolors.countries <-c(blues, yellows, reds, greys) # Display the color hex codes (optional)colors.countries %>%show_col(cex_label = .7) # Convert the "countries" variable into a factor (in reverse order)names(colors.countries) <- d.tidy$countries %>%unique() d.tidy.ordered <- d.tidy %>%mutate(countries =factor( countries, levels =rev(names(colors.countries)))) # Create an area / alluvium plot, with annotating rectangles at the bottom layerp0 <- d.tidy.ordered %>%ggplot(aes(x = year, y = population, fill = countries)) +theme(legend.position ="none") p1 <- p0 +annotate(geom ="rect", xmin =c(1914, 1939),xmax =c(1918, 1945), ymin =0, ymax =Inf,fill ="snow2") +geom_alluvium(aes(alluvium = countries), alpha =1)p1 # Update the color scale using the prepared color vector.p2 <- p1 +scale_fill_manual(values = colors.countries) p2 # Add text annotations. p3 <- p2 +# World Warsannotate(geom ="text", x =c(1914, 1939), y =c(9*10^6, 4*10^6), label =c("World\nWar I", "World\nWar II"),hjust =0) +# continentsannotate(geom ="text", x =c(1895, 1990, 1995), y =c(1, 2, 6) *10^6, label =c("EUROPE", "ASIA", "AME-\nRICA"), fontface ="bold", size =c(8, 8, 5)) p3 # Revise the axes.p4 <- p3 +# expand to fill up the plot rangecoord_cartesian(expand =0) +scale_x_continuous(breaks =seq(1820, 2010, 20)) +scale_y_continuous(breaks =seq(0, 10*10^6, 2*10^6),labels =function(x){paste(x/10^6, "M")},position ="right", name =NULL) p4 # A final touch-up of the titles and theme. p5 <- p4 +# plot titleslabs(title ="Migration to the US by world regions, 1820-2009",subtitle ="Ribbon width represents number of migrants per decade from a country / region") +theme(plot.title =element_text(face ="bold",size =16),plot.subtitle =element_text(size =11, color ="grey30"),panel.grid.major.y =element_line()) p5
Continue Exploring — 🚀 one level up!
Similar to geom_alluvium from the ggalluvial package, geom_stream from ggstream also generates area/ribbon plots, yet in a more variant and visually engaging style. Check this symmetrically stacked stream plot that displays the changing popularity of different movie genres from 1980 to 2020.
In the above U.S. migration visualization, a critical technique is the reordering of country names, which empowers the use of color gradient to differentiate countries and continents in the stacked area and ribbon plot. You can find many more examples like this in our gallery how reordering enhances plot readability and beauty. As a representative example, in the following plot, ordering of bars and refinement of axis-text position effectively highlights the differences of animal’s sleep time, and its association with body weight.