Draw Line Plots in ggplot2 to Visualize Changes in Global Life Expectancy

In this article, we’ll use the gapminder dataset to visualize the changing human life expectancy across five continents from 1952 to 2007.

Major techniques explained in this article include:


Packages and data cleanup

library(ggplot2)library(dplyr)library(gapminder) # load the "gapminder" dataset
# set default global themetheme_set(theme_classic(base_size = 12))
head(gapminder, 3)

Output:

# A tibble: 3 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1952 28.8 8425333 779.
2 Afghanistan Asia 1957 30.3 9240934 821.
3 Afghanistan Asia 1962 32.0 10267083 853.

The plot will be faceted into subplots based on the continent variable. Here we specify the order of continents the subplots should be arranged by. A nominal (fct) and an ordered (ord) factor type work the same regarding the reordering of graphic units. More details about graphic elements reordering can be found in this complete guide.

# arrange the faceted panels in this order:ordered.continent <- c(  "Africa", "Asia", "Americas", "Europe", "Oceania")
# convert "continent" to a factor with specified level orderg <- gapminder %>% mutate(continent = factor(continent, levels = ordered.continent))
head(g, n = 3) # ready for visualization

Output:

# A tibble: 3 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1952 28.8 8425333 779.
2 Afghanistan Asia 1957 30.3 9240934 821.
3 Afghanistan Asia 1962 32.0 10267083 853.

Visualization

1. Create a line plot showing the changes in life expectancy over the years in each country. Each individual country is displayed as a distinct line.

p1 <- g %>%   ggplot(aes(x = year, y = lifeExp,              color = continent, fill = continent )) +  geom_line(aes(group = country), alpha = .3)p1

2. Create continent-wise average trend lines using the stat_summary() function.

  • Based on the aesthetic inheritance of color = continent from the ggplot() line, a trend line is calculated for the average life expectancy separately for each continent, is visualized in the matched color.

  • Since a single statistic value, the mean, is calculated, we use the fun argument (as opposed to fun.data when multiple statistics are calculated at step 3).

p2 <- p1 +  stat_summary(fun = mean, geom = "line", size = 2)p2

3. Draw ribbons to depict one standard deviation (SD) above and below the mean of life expectancy.

  • As multiple statistics are calculated, i.e., the mean, the upper SD limit, and the lower SD limit, we use argument fun.data (as opposed to fun for a single statistic at step2).

  • Use fun.args = list(...) for additional arguments of the mean_sdl function.

  • In like manner, mean_se calculates the standard error of the mean, mean_cl_normal calculates the confidence interval. Find more stat_summary() examples here, and check out our awesome ggplot2 ebook to easily master all essential skills.

p3 <- p2 +    stat_summary(    # 1 multiple of SD around the mean    fun.data = mean_sdl, fun.args = list(mult = 1),    geom = "ribbon", alpha = .3, color = NA)
p3

4. Divide the plot into subplots, with each continent in a separate panel. And remove the now redundant legend. In addition, we update the color using continent_colors, a named color vector built in the gapminder package. The panels are displayed in order of life expectancy thanks to the conversion of continent (the faceting variable) into an ordered factor at this early step of data processing.

p4 <- p3 +   facet_wrap(~continent, nrow = 1) +  theme(legend.position = "none") +  scale_color_manual(values = continent_colors) +  scale_fill_manual(values = continent_colors) 
p4

5. Add annotation lines and texts to mark the starting and ending years of 1952 and 2007. Note that the annotations are repeated across all subplots. As comparison, step 8 makes unique annotations in each panel.

p5 <- p4 +  # add annotating lines to mark years 1952 vs. 2007  # year 1952  geom_vline(xintercept = 1952, linetype = "dashed", color = "orange3") +  # year 2007  geom_vline(xintercept = 2007, linetype = "dashed", color = "skyblue3") +    # add text annotation  # year 1952  annotate(geom = "text", x = 1952, y = 20, label = " 1952",            fontface = "bold", hjust = 0, color = "orange3") +  # year 2007  annotate(geom = "text", x = 2007, y = 20, label = "2007 ",            fontface = "bold", hjust = 1, color = "skyblue3") p5

6. Annotate the plot with the average life expectancy in 1952 and 2007 in each continent. Here we first create a small dataset of the summary statistics.

life.1952_2007 <- g %>%   filter(year %in% c(1952, 2007)) %>%   group_by(continent, year) %>%   summarise(lifeExp = mean(lifeExp) %>% round())
head(life.1952_2007, 3)

Output:

# A tibble: 3 × 3
# Groups: continent [2]
continent year lifeExp
<fct> <int> <dbl>
1 Africa 1952 39
2 Africa 2007 55
3 Asia 1952 46

Label the plot with the average life expectancy in 1952 and 2007. As the labels extend beyond the boundary of the faceted panels, most of labels are clipped off by default. We’ll fix this in the next step.

p6 <- p5 +   geom_label(    data = life.1952_2007, aes(label = lifeExp),     color = "white",         # right and left justify the labels, respectively,     # to the left and right edge of the faceted panels.     hjust = c(1, 0) %>% rep(5)) p6

7. Completely display the texts (or other graphical elements) beyond the plot/panel boundary.

  • Turn off the clip in the associated coordinate. The clip argument applies not only in coord_cartesian(), but also other coordinate systems if used, e.g., coord_flip(), or coord_fixed(), coord_polar(), etc.

  • In the theme() function, increase the plot.margin (space around the entire plot) and panel.spacing (space between the subplots).

p7 <- p6 +   # not clip graphical elements beyond the panel range  coord_cartesian(clip = "off") +   theme(    # increase margin on the four edges of the plot    plot.margin = margin(rep(20, 4)),    # increase margin between panels    panel.spacing = unit(20, "pt"))p7

8. Label the panels with continent names in place of the default subplot titles (removed in the next step). In the dataset panel.titles defined below, the continent variables serves two important roles:

  • It is the faceting variable, indicative of which panels to be labeled;
  • It is mapped to the label aesthetic to label panels with associated continent names.
panel.titles <- tibble(continent = factor(ordered.continent))
p8 <- p7 + geom_text( data = panel.titles, aes(x = 1980, y = 30, label = continent), size = 7, fontface = "bold")p8

9. Load font from the Google Font repository.

# add font "Abril Fatface", and name the font as "fat" for our uselibrary(showtext)showtext_auto()font_add_google(name = "Abril Fatface", family = "fat")

Polish up the theme.

p9 <- p8 +   # titles  labs(    title = "Steady increase of Human Life Expectancy",     caption = "Each line represents one country; central line: the average; \nribbon, one standard deviation around the mean.",    x = NULL)  +  theme(    strip.text = element_blank(),         axis.line.y = element_blank(),    axis.line.x = element_blank(),    axis.text.x = element_blank(),    axis.ticks.x = element_blank(),    axis.title.y = element_blank(),    axis.ticks.length.y = unit(20, "pt"),    axis.ticks.y = element_line(linetype = "dashed"),        plot.title = element_text(size = 18, family = "fat"),    plot.caption = element_text(hjust = 0, size = 10, color = "grey60"),    plot.background = element_rect(fill = "azure"),    panel.background = element_rect(fill = "azure")  ) 
p9

Save the plot. By default, the most recently displayed graphic will be saved.

ggsave(filename = "line plot life expectancy.pdf",       path = "/Users/boyuan/Desktop/R/gallery/graphics",       width = 8, height = 5)
# Packages and data cleanuplibrary(ggplot2)library(dplyr)library(gapminder)theme_set(theme_classic(base_size = 12))
head(gapminder, 3)
# The plot will be faceted into subplots based on the `continent` variable. # Specify the order of continents the subplots should be arranged by. ordered.continent <- c( "Africa", "Asia", "Americas", "Europe", "Oceania")
# convert "continent" to a factor with specified level orderg <- gapminder %>% mutate(continent = factor(continent, levels = ordered.continent))
head(g, n = 3) # ready for visualization!

### Visualization
# Create a line plot, with each line representing one country. p1 <- g %>% ggplot(aes(x = year, y = lifeExp, color = continent, fill = continent )) + geom_line(aes(group = country), alpha = .3)p1

# Add average trending line.p2 <- p1 + stat_summary(fun = mean, geom = "line", size = 2)p2

# Add ribbons to depict one standard deviation (SD) above and below the mean. p3 <- p2 + stat_summary( # 1 multiple of SD around the mean fun.data = mean_sdl, fun.args = list(mult = 1), geom = "ribbon", alpha = .3, color = NA)p3

p4 <- p3 + # Divide the plot into subplots. facet_wrap(~continent, nrow = 1) + # Remove the legend. theme(legend.position = "none") + # Update the color scale. # the named color vector `continent_colors` is built in the gapminder package. scale_color_manual(values = continent_colors) + scale_fill_manual(values = continent_colors) p4

# Add annotation lines and texts to mark the starting and ending years: 1952 vs 2007. p5 <- p4 + # add annotating lines # year 1952 geom_vline(xintercept = 1952, linetype = "dashed", color = "orange3") + # year 2007 geom_vline(xintercept = 2007, linetype = "dashed", color = "skyblue3") + # add texts # year 1952 annotate(geom = "text", x = 1952, y = 20, label = " 1952", fontface = "bold", hjust = 0, color = "orange3") + # year 2007 annotate(geom = "text", x = 2007, y = 20, label = "2007 ", fontface = "bold", hjust = 1, color = "skyblue3") p5

# Calculate the average life expectancy in 1952 and 2007 in each of the continents.life.1952_2007 <- g %>% filter(year %in% c(1952, 2007)) %>% group_by(continent, year) %>% summarise(lifeExp = mean(lifeExp) %>% round())
head(life.1952_2007, 3)
p6 <- p5 + geom_label( data = life.1952_2007, aes(label = lifeExp), color = "white", # right and left justify the labels, respectively, # to the left and right edge of the faceted panels. hjust = c(1, 0) %>% rep(5)) p6

# Enable complete display of the texts (or other graphical elements) beyond the plot boundary.p7 <- p6 + # not clip graphical elements beyond the panel range coord_cartesian(clip = "off") + theme( # increase margin on the four edges of the plot plot.margin = margin(rep(20, 4)), # increase margin between panels panel.spacing = unit(20, "pt"))p7

# Label the panels with continent names in replace of the default subplot titlespanel.titles <- tibble(continent = factor(ordered.continent))
p8 <- p7 + geom_text( data = panel.titles, aes(x = 1980, y = 30, label = continent), size = 7, fontface = "bold")p8

# Load font "Abril Fatface" from the Google Font Repository, and polish up the theme. library(showtext)showtext_auto()font_add_google(name = "Abril Fatface", family = "fat")
p9 <- p8 + # titles labs( title = "Steady increase of Human Life Expectancy", caption = "Each line represents one country; central line: the average; \nribbon, one standard deviation around the mean.", x = NULL) + theme( strip.text = element_blank(), axis.line.y = element_blank(), axis.line.x = element_blank(), axis.text.x = element_blank(), axis.ticks.x = element_blank(), axis.title.y = element_blank(), axis.ticks.length.y = unit(20, "pt"), axis.ticks.y = element_line(linetype = "dashed"), plot.title = element_text(size = 18, family = "fat"), plot.caption = element_text(hjust = 0, size = 10, color = "grey60"), plot.background = element_rect(fill = "azure"), panel.background = element_rect(fill = "azure") )
p9

# Save the plot. ggsave(filename = "line plot life expectancy.pdf", path = "/Users/boyuan/Desktop/R/gallery/graphics", width = 8, height = 5)




Continue Exploring — 🚀 one level up!


Check the following annotated line plot that shows the changing popularity of smoking worldwide, in particular in the highlighted countries of United States, Germany, and France.



In addition to line plots, arrows are a concise and powerful tool to make visual contrast between two time points. Check the following annotated and faceted arrow plot that illustrates the changing percentage of National Parliaments’ seats held by women between 2020 and 2000.



Furthermore, ribbons are an attractive alternative to illustrate chronological changes with engaging visual appeal, especially when the summed quantity at each time point is of interest. Check out this awesome stacked ribbon / alluvium plot, which shows dynamic shifts in the migrant population to the United States from 1820 to 2009.