Create Elegant Animation of Population Pyramids with ggplot2 and gganimate

In an earlier article, we created a population pyramid of Germany in 2023. In this article, we’ll build on the earlier work to animate the annual population pyramids from 1970 to 2070 (with simulated data). Majority of the script in this article is the same as the static graphic, and new edits designed for animation will be highlighted in green. 🌻


Packages and data import.

We’ll use the gganimate package to convert a static ggplot2 graphic into animation.

library(ggplot2)library(dplyr)library(tidyr)library(stringr)   # for string manipulationlibrary(gganimate) # create animation

The raw data is sourced from Federal Statistical Office of Germany. The dataset used in this animation (a cleaned version, including data from 1970 to 2070) can be downloaded here.

d <- read.csv(  "/Users/boyuan/Desktop/R/gallery/DATASETS/germany_population_all_years.csv",   # remove the "X" letters otherwise added to headers started with numbers  check.names = F) 
d <- d %>% as_tibble()
head(d, n = 3)

Output:

# A tibble: 3 × 103
past.future year gender `0_1` `1_2` `2_3` `3_4` `4_5` `5_6` `6_7` `7_8` `8_9`
<chr> <int> <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 past 1990 m 468 458 471 460 451 435 433 437 449
2 past 1991 f 406 446 441 452 440 433 418 415 419
3 past 1991 m 427 471 463 477 465 456 440 438 442
# ℹ 91 more variables: `9_10` <int>, `10_11` <int>, `11_12` <int>,
# `12_13` <int>, `13_14` <int>, `14_15` <int>, `15_16` <int>, `16_17` <int>,
# `17_18` <int>, `18_19` <int>, `19_20` <int>, `20_21` <int>, `21_22` <int>,
# `22_23` <int>, `23_24` <int>, `24_25` <int>, `25_26` <int>, `26_27` <int>,
# `27_28` <int>, `28_29` <int>, `29_30` <int>, `30_31` <int>, `31_32` <int>,
# `32_33` <int>, `33_34` <int>, `34_35` <int>, `35_36` <int>, `36_37` <int>,
# `37_38` <int>, `38_39` <int>, `39_40` <int>, `40_41` <int>, …
  • gender: f for female, and m for male

  • past.future labels years before 2024 as “past”, and year 2024 and after as “future”. The “future” data will be visualized in brighter colors to be distinguished from the “past” data.

  • The fourth to the last column record the populations at each age interval.

Transform the dataset into a tidy format:. pack the ages into one column, and the associated populations into another column. This way, the dataset is transformed from a short wide format into a long narrow format.

d.tidy <- d %>%   pivot_longer(-c(1:3), values_to = "population", names_to = "age")

Modify the population variable to create bars in a symmetrical layout:

  • The bars corresponding to male occupies the positive y-axis range, and bars of the female takes the negative y-axis range.

  • Increase the bar length by extra 80 units for each gender to offset the overlay of the central banner, which covers the bars of each gender by 80 units.

d.tidy <- d.tidy %>%   mutate(population = ifelse(    gender == "f", - population - 80, population + 80 ))

Arrange the population bars by the age. To do this, we need to convert the age variable to a factor to “memorize” the age levels arranged from young to senior. We can easily get the orderd age levels from the column names in the original dataset d. (Learn all essential skills in reordering graphic elements in this complete tutorial)

d.tidy$age <- factor(d.tidy$age, levels = colnames(d)[-c(1:3)])

Highlight the population difference between genders. Here we create a new column pop.min to record the minimum population value of the two genders in each year and each age group.In the line of group_by, we have two variables, year and age. The minimum population values will be visualized as a semi-transparent masking layer in lighter color, and thus highlight the excess population in more saturated colors in one of the genders.

d.tidy <- d.tidy %>%   group_by(year, age) %>%  mutate(pop.min = min(abs(population)),          # female in negative values, male in positive values         pop.min = ifelse(gender == "m", pop.min, -pop.min))

Customize the color scale, such that:

  • Bars of man are drawn in blue, and women in red.
  • The future data (year 2024 and after) are drawn in brighter colors.

To achieve this, we create a new column time.gender to record the year - gender condition, and map it to the fill aesthetic. The color scale is manually defined in the named vector my.colors.

d.tidy <- d.tidy %>%   mutate(time.gender = str_c(past.future, "_", gender))
my.colors <- c("past_m" = "steelblue4", "past_f" = "red4", "future_m" = "steelblue1", "future_f" = "red1")
head(d.tidy, n = 4) # now ready for visualization!

Output:

# A tibble: 4 × 7
# Groups: year, age [4]
past.future year gender age population pop.min time.gender
<chr> <int> <chr> <fct> <dbl> <dbl> <chr>
1 past 1990 m 0_1 548 548 past_m
2 past 1990 m 1_2 538 538 past_m
3 past 1990 m 2_3 551 551 past_m
4 past 1990 m 3_4 540 540 past_m

Visualization

Per visualization procedure, we’ll first create bars of all years in a superimposed manner, and then use animation to “facet” the plot, as if with facet_wrap, but sequentially on a time scale.


🌳🌳

Optional: It is of great help to use only a small data subset (e.g., year of 1980) to develop the code during the initial stage or to ease troubleshooting. The line below reflects this practice. By commenting out (or not) the filter function, it is easy to toggle between the subset and the whole dataset. In this article, I have commented out the subset filtration, and will use the entire dataset for following demonstration.

d.tidy <- d.tidy # %>% filter(year %in% c(1980))

🌳🌳


Create bar plots with flipped axes. As the population value of males are positive, and females are negative, this creates bars in a symmetrical layout like mirror images (thanks to this step of data processing).

  • The bar position is identity, instead of the default stack. This means that each bar is rooted at population of 0. The plot shown here is an overlap of 81 individual pyramids from years 1990 to 2040.

  • The fill aesthetic is mapped with variable time.gender, on the self-defined color scale of my.colors.

p1 <- d.tidy %>%   ggplot(aes(x = age, y = population, fill = time.gender)) +  geom_col(position = "identity", width = 1) +  coord_flip(clip = "off") +  scale_fill_manual(values = my.colors)p1

Highlight population difference between genders. The regions more saturated in color signify the population excess in one gender compared to the other. This effect is achieved by generating a masking layer of bars with semi-transparent white color, with bar length equal to the minimum population of the genders in each year and at each age (thanks to this step of data processing). Due to the overlap of the many pyramids, the superimposed translucency results in complete opaqueness of white (which will be resolved later in animation).

p2 <- p1 +   geom_col(aes(y = pop.min),            fill = "white", alpha = .5, width = 1,           position = "identity")  p2

Create a central banner overlay on the bars. This banner will contain the age labels, in place of the original x-axis (left vertical). Centered at y = 0, the width of the banner spans a total of 160 units, covering 80 units of bar length of each gender. To offset the coverage of the central banner to the underlying bars, we had included an additional 80 units to the population values at this early step of data processing.

p3 <- p2 +   annotate(geom = "rect", # create rectangles           xmin = -Inf, xmax = Inf, ymin = 80, ymax = -80,           fill = "floralwhite")p3

Create age labels in the central banner. We’ll remove the x-axis labels later.

p4 <- p3 +  annotate(geom = "text",           x = seq(0, 100, 5), y = 0,            label = seq(0, 100, 5),           size = 3, fontface = "bold") p4

Revise the y-axis (bottom horizontal), and add plot titles.

  • To correct for the coverage of the central banner, the y-axis for male and female data should be shifted towards the right and left by an additional 80 units, respectively. Therefore, here we calculate the updated y-axis breaks, and label the axis with positive population values.

  • Use white spaces in the y-axis title to symmetrically split the title into two pieces, each half for each of the genders.

  • Add subtitle {frame_time} to dynamically display the corresponding year during the animation.

breaks <- seq(0, 700, by = 100)breaks.updated <- c(breaks + 80, - breaks - 80)
p5 <- p4 + scale_y_continuous( breaks = breaks.updated, labels = function(x) {abs(x) - 80 }) + labs(y = "Female (in thousands) Male (in thousands) ", title = "Simulated year", subtitle = "{frame_time}") p5

Revise the theme. The plot title position is horizontally and vertically adjusted with hjust and vjust, respectively.

p6 <- p5 +   theme_void() + # an empty canvas, with removal of all axis elements  theme(    axis.text.x = element_text(size = 10, color = "snow4", margin = margin(t = 5)),    axis.title.x = element_text(face = "bold", margin = margin(t = 5, b = 5)),    plot.title = element_text(hjust = .1, vjust = -10, size = 10),    plot.subtitle = element_text(hjust = .1, vjust = -6, face = "bold", size = 20),    legend.position = "none")
p6

Create animation. In the static plot, bars from all years are superimposed. The animation serves similarly to facet_wrap(~year) (creating subplots for data of each year), but facets on a time scale. gganimate is an elegant package that converts a ggplot2 static graphic into frames of animation. It may take multiple seconds to render the animation.

p7 <- p6 + transition_time(time = year)p7

Save the animation. By default, the last rendered animation will be saved. The animation argument (not shown here) can be used to explicitly indicate the animation to be saved.

anim_save(filename = "Population pyramid animation.gif",           path = "graphics")
library(ggplot2)library(dplyr)library(tidyr)library(stringr) # for string manipulationlibrary(gganimate) # create animation
d <- read.csv( "/Users/boyuan/Desktop/R/gallery/DATASETS/germany_population_all_years.csv", # remove the "X" letters otherwise added to headers started with numbers check.names = F)
head(d, n = 3)
# Tidy upd.tidy <- d %>% as_tibble() %>% pivot_longer(-c(1:3), values_to = "population", names_to = "age")

# Male on the positive range of y axis, and female on the negative range. # Increase the bar length by 80 units to offset the coverage of the central banner.d.tidy <- d.tidy %>% as_tibble() %>% mutate(population = ifelse(gender == "f", - population - 80, population + 80 ))

# Display the population pyramid in order of age.d.tidy$age <- factor(d.tidy$age, levels = colnames(d)[-c(1:3)])

# Highlight the population difference between genders.d.tidy <- d.tidy %>% group_by(year, age) %>% mutate(pop.min = min(abs(population)), # female in negative values, male in positive values pop.min = ifelse(gender == "m", pop.min, -pop.min))

# Make bars of men in blue, and women in red. # Make the past data (before 2024) in dark color, and future data in brighter color. d.tidy <- d.tidy %>% mutate(time.gender = str_c(past.future, "_", gender))
my.colors <- c("past_m" = "steelblue4", "past_f" = "red4", "future_m" = "steelblue1", "future_f" = "red1")
# now ready for visualization!head(d.tidy, n = 4)
### Visualization ----------------------------------------------------
# Optional: create subset to make it easier for code development and troubleshootingd.tidy <- d.tidy # %>% filter(year %in% c(1980))

# Create bar plots with flipped axes. Pyramids of all years data are now superimposed.p1 <- d.tidy %>% ggplot(aes(x = age, y = population, fill = time.gender)) + geom_col(position = "identity", width = 1) + coord_flip(clip = "off") + scale_fill_manual(values = my.colors)p1

# Highlight population difference between genders.p2 <- p1 + geom_col(aes(y = pop.min), fill = "white", alpha = .5, width = 1, position = "identity") p2

# Create a central banner overlay on the bars.p3 <- p2 + annotate(geom = "rect", # create rectangles xmin = -Inf, xmax = Inf, ymin = 80, ymax = -80, fill = "floralwhite")p3

# Label with ages in the central bannerp4 <- p3 + annotate(geom = "text", x = seq(0, 100, 5), y = 0, label = seq(0, 100, 5), size = 3, fontface = "bold") p4

# Revise the axial labels and titles.breaks <- seq(0, 700, by = 100)breaks.updated <- c(breaks + 80, - breaks - 80)
p5 <- p4 + scale_y_continuous( breaks = breaks.updated, labels = function(x) {abs(x) - 80 }) + labs(y = "Female (in thousands) Male (in thousands) ", title = "Simulated year", subtitle = "{frame_time}") p5
# Revise the theme. p6 <- p5 + theme_void() + # an empty canvas, with removal of all axis elements theme( axis.text.x = element_text(size = 10, color = "snow4", margin = margin(t = 5)), axis.title.x = element_text(face = "bold", margin = margin(t = 5, b = 5)), plot.title = element_text(hjust = .1, vjust = -10, size = 10), plot.subtitle = element_text(hjust = .1, vjust = -6, face = "bold", size = 20), legend.position = "none")
p6

# Create animation.p7 <- p6 + transition_time(time = year)p7
# Save the animationanim_save(filename = "Population pyramid animation.gif", path = "graphics")




Continue Exploring — 🛩 one level up!


Check out the following article, where we’ll visualize the global flights and airports over the world map in animation. (And check here if you wanted the static plot first)