library(ggplot2)library(dplyr)library(tidyr)library(stringr) # for string manipulationlibrary(gganimate) # create animation
Create Elegant Animation of Population Pyramids with ggplot2 and gganimate

In an earlier article, we created a population pyramid of Germany in 2023. In this article, we’ll build on the earlier work to animate the annual population pyramids from 1970 to 2070 (with simulated data). Majority of the script in this article is the same as the static graphic, and new edits designed for animation will be highlighted in green. 🌻
Packages and data import.
We’ll use the gganimate
package to convert a static ggplot2 graphic into animation.
The raw data is sourced from Federal Statistical Office of Germany. The dataset used in this animation (a cleaned version, including data from 1970 to 2070) can be downloaded here.
<- read.csv( d "/Users/boyuan/Desktop/R/gallery/DATASETS/germany_population_all_years.csv", # remove the "X" letters otherwise added to headers started with numbers check.names = F )
<- d %>% as_tibble() d
head(d, n = 3)
Output:
# A tibble: 3 × 103
past.future year gender `0_1` `1_2` `2_3` `3_4` `4_5` `5_6` `6_7` `7_8` `8_9`
<chr> <int> <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 past 1990 m 468 458 471 460 451 435 433 437 449
2 past 1991 f 406 446 441 452 440 433 418 415 419
3 past 1991 m 427 471 463 477 465 456 440 438 442
# ℹ 91 more variables: `9_10` <int>, `10_11` <int>, `11_12` <int>,
# `12_13` <int>, `13_14` <int>, `14_15` <int>, `15_16` <int>, `16_17` <int>,
# `17_18` <int>, `18_19` <int>, `19_20` <int>, `20_21` <int>, `21_22` <int>,
# `22_23` <int>, `23_24` <int>, `24_25` <int>, `25_26` <int>, `26_27` <int>,
# `27_28` <int>, `28_29` <int>, `29_30` <int>, `30_31` <int>, `31_32` <int>,
# `32_33` <int>, `33_34` <int>, `34_35` <int>, `35_36` <int>, `36_37` <int>,
# `37_38` <int>, `38_39` <int>, `39_40` <int>, `40_41` <int>, …
gender
:f
for female, andm
for malepast.future
labels years before 2024 as “past”, and year 2024 and after as “future”. The “future” data will be visualized in brighter colors to be distinguished from the “past” data.The fourth to the last column record the populations at each age interval.
Transform the dataset into a tidy format:. pack the ages into one column, and the associated populations into another column. This way, the dataset is transformed from a short wide format into a long narrow format.
<- d %>% d.tidy pivot_longer(-c(1:3), values_to = "population", names_to = "age")
Modify the population
variable to create bars in a symmetrical layout:
The bars corresponding to male occupies the positive y-axis range, and bars of the female takes the negative y-axis range.
Increase the bar length by extra 80 units for each gender to offset the overlay of the central banner, which covers the bars of each gender by 80 units.
<- d.tidy %>% d.tidy mutate(population = ifelse( == "f", - population - 80, population + 80 )) gender
Arrange the population bars by the age. To do this, we need to convert the age
variable to a factor to “memorize” the age levels arranged from young to senior. We can easily get the orderd age levels from the column names in the original dataset d
. (Learn all essential skills in reordering graphic elements in this complete tutorial)
$age <- factor(d.tidy$age, levels = colnames(d)[-c(1:3)]) d.tidy
Highlight the population difference between genders. Here we create a new column pop.min
to record the minimum population value of the two genders in each year and each age group.In the line of group_by
, we have two variables, year
and age
. The minimum population values will be visualized as a semi-transparent masking layer in lighter color, and thus highlight the excess population in more saturated colors in one of the genders.
<- d.tidy %>% d.tidy group_by(year, age) %>% mutate(pop.min = min(abs(population)), # female in negative values, male in positive values pop.min = ifelse(gender == "m", pop.min, -pop.min))
Customize the color scale, such that:
- Bars of man are drawn in blue, and women in red.
- The future data (year 2024 and after) are drawn in brighter colors.
To achieve this, we create a new column time.gender
to record the year - gender condition, and map it to the fill
aesthetic. The color scale is manually defined in the named vector my.colors
.
<- d.tidy %>% d.tidy mutate(time.gender = str_c(past.future, "_", gender))
<- c("past_m" = "steelblue4", "past_f" = "red4", my.colors "future_m" = "steelblue1", "future_f" = "red1")
head(d.tidy, n = 4) # now ready for visualization!
Output:
# A tibble: 4 × 7
# Groups: year, age [4]
past.future year gender age population pop.min time.gender
<chr> <int> <chr> <fct> <dbl> <dbl> <chr>
1 past 1990 m 0_1 548 548 past_m
2 past 1990 m 1_2 538 538 past_m
3 past 1990 m 2_3 551 551 past_m
4 past 1990 m 3_4 540 540 past_m
Visualization
Per visualization procedure, we’ll first create bars of all years in a superimposed manner, and then use animation to “facet” the plot, as if with facet_wrap
, but sequentially on a time scale.
🌳🌳
Optional: It is of great help to use only a small data subset (e.g., year of 1980) to develop the code during the initial stage or to ease troubleshooting. The line below reflects this practice. By commenting out (or not) the filter
function, it is easy to toggle between the subset and the whole dataset. In this article, I have commented out the subset filtration, and will use the entire dataset for following demonstration.
<- d.tidy # %>% filter(year %in% c(1980)) d.tidy
🌳🌳
Create bar plots with flipped axes. As the population value of males are positive, and females are negative, this creates bars in a symmetrical layout like mirror images (thanks to this step of data processing).
The bar position is
identity
, instead of the defaultstack
. This means that each bar is rooted at population of 0. The plot shown here is an overlap of 81 individual pyramids from years 1990 to 2040.The
fill
aesthetic is mapped with variabletime.gender
, on the self-defined color scale ofmy.colors
.
<- d.tidy %>% p1 ggplot(aes(x = age, y = population, fill = time.gender)) + geom_col(position = "identity", width = 1) + coord_flip(clip = "off") + scale_fill_manual(values = my.colors) p1
Highlight population difference between genders. The regions more saturated in color signify the population excess in one gender compared to the other. This effect is achieved by generating a masking layer of bars with semi-transparent white color, with bar length equal to the minimum population of the genders in each year and at each age (thanks to this step of data processing). Due to the overlap of the many pyramids, the superimposed translucency results in complete opaqueness of white (which will be resolved later in animation).
<- p1 + p2 geom_col(aes(y = pop.min), fill = "white", alpha = .5, width = 1, position = "identity") p2
Create a central banner overlay on the bars. This banner will contain the age labels, in place of the original x-axis (left vertical). Centered at y = 0, the width of the banner spans a total of 160 units, covering 80 units of bar length of each gender. To offset the coverage of the central banner to the underlying bars, we had included an additional 80 units to the population values at this early step of data processing.
<- p2 + p3 annotate(geom = "rect", # create rectangles xmin = -Inf, xmax = Inf, ymin = 80, ymax = -80, fill = "floralwhite") p3
Create age labels in the central banner. We’ll remove the x-axis labels later.
<- p3 + p4 annotate(geom = "text", x = seq(0, 100, 5), y = 0, label = seq(0, 100, 5), size = 3, fontface = "bold") p4
Revise the y-axis (bottom horizontal), and add plot titles.
To correct for the coverage of the central banner, the y-axis for male and female data should be shifted towards the right and left by an additional 80 units, respectively. Therefore, here we calculate the updated y-axis breaks, and label the axis with positive population values.
Use white spaces in the y-axis title to symmetrically split the title into two pieces, each half for each of the genders.
Add subtitle
{frame_time}
to dynamically display the corresponding year during the animation.
<- seq(0, 700, by = 100) breaks <- c(breaks + 80, - breaks - 80) breaks.updated
<- p4 + p5 scale_y_continuous( breaks = breaks.updated, labels = function(x) {abs(x) - 80 }) + labs(y = "Female (in thousands) Male (in thousands) ", title = "Simulated year", subtitle = "{frame_time}") p5
Revise the theme. The plot title position is horizontally and vertically adjusted with hjust
and vjust
, respectively.
<- p5 + p6 theme_void() + # an empty canvas, with removal of all axis elements theme( axis.text.x = element_text(size = 10, color = "snow4", margin = margin(t = 5)), axis.title.x = element_text(face = "bold", margin = margin(t = 5, b = 5)), plot.title = element_text(hjust = .1, vjust = -10, size = 10), plot.subtitle = element_text(hjust = .1, vjust = -6, face = "bold", size = 20), legend.position = "none")
p6
Create animation. In the static plot, bars from all years are superimposed. The animation serves similarly to facet_wrap(~year)
(creating subplots for data of each year), but facets on a time scale. gganimate
is an elegant package that converts a ggplot2 static graphic into frames of animation. It may take multiple seconds to render the animation.
<- p6 + transition_time(time = year) p7 p7
Save the animation. By default, the last rendered animation will be saved. The animation
argument (not shown here) can be used to explicitly indicate the animation to be saved.
anim_save(filename = "Population pyramid animation.gif", path = "graphics")
library(ggplot2)library(dplyr)library(tidyr)library(stringr) # for string manipulationlibrary(gganimate) # create animation
<- read.csv( d "/Users/boyuan/Desktop/R/gallery/DATASETS/germany_population_all_years.csv", # remove the "X" letters otherwise added to headers started with numbers check.names = F )
head(d, n = 3)
# Tidy up<- d %>% as_tibble() %>% d.tidy pivot_longer(-c(1:3), values_to = "population", names_to = "age")
# Male on the positive range of y axis, and female on the negative range. # Increase the bar length by 80 units to offset the coverage of the central banner.<- d.tidy %>% as_tibble() %>% d.tidy mutate(population = ifelse(gender == "f", - population - 80, population + 80 ))
# Display the population pyramid in order of age.$age <- factor(d.tidy$age, levels = colnames(d)[-c(1:3)]) d.tidy
# Highlight the population difference between genders.<- d.tidy %>% d.tidy group_by(year, age) %>% mutate(pop.min = min(abs(population)), # female in negative values, male in positive values pop.min = ifelse(gender == "m", pop.min, -pop.min))
# Make bars of men in blue, and women in red. # Make the past data (before 2024) in dark color, and future data in brighter color. <- d.tidy %>% d.tidy mutate(time.gender = str_c(past.future, "_", gender))
<- c("past_m" = "steelblue4", "past_f" = "red4", my.colors "future_m" = "steelblue1", "future_f" = "red1")
# now ready for visualization!head(d.tidy, n = 4)
### Visualization ----------------------------------------------------
# Optional: create subset to make it easier for code development and troubleshooting<- d.tidy # %>% filter(year %in% c(1980)) d.tidy
# Create bar plots with flipped axes. Pyramids of all years data are now superimposed.<- d.tidy %>% p1 ggplot(aes(x = age, y = population, fill = time.gender)) + geom_col(position = "identity", width = 1) + coord_flip(clip = "off") + scale_fill_manual(values = my.colors) p1
# Highlight population difference between genders.<- p1 + p2 geom_col(aes(y = pop.min), fill = "white", alpha = .5, width = 1, position = "identity") p2
# Create a central banner overlay on the bars.<- p2 + p3 annotate(geom = "rect", # create rectangles xmin = -Inf, xmax = Inf, ymin = 80, ymax = -80, fill = "floralwhite") p3
# Label with ages in the central banner<- p3 + p4 annotate(geom = "text", x = seq(0, 100, 5), y = 0, label = seq(0, 100, 5), size = 3, fontface = "bold") p4
# Revise the axial labels and titles.<- seq(0, 700, by = 100) breaks <- c(breaks + 80, - breaks - 80) breaks.updated
<- p4 + p5 scale_y_continuous( breaks = breaks.updated, labels = function(x) {abs(x) - 80 }) + labs(y = "Female (in thousands) Male (in thousands) ", title = "Simulated year", subtitle = "{frame_time}") p5
# Revise the theme. <- p5 + p6 theme_void() + # an empty canvas, with removal of all axis elements theme( axis.text.x = element_text(size = 10, color = "snow4", margin = margin(t = 5)), axis.title.x = element_text(face = "bold", margin = margin(t = 5, b = 5)), plot.title = element_text(hjust = .1, vjust = -10, size = 10), plot.subtitle = element_text(hjust = .1, vjust = -6, face = "bold", size = 20), legend.position = "none")
p6
# Create animation.<- p6 + transition_time(time = year) p7 p7
# Save the animationanim_save(filename = "Population pyramid animation.gif", path = "graphics")
Continue Exploring — 🛩 one level up!
Check out the following article, where we’ll visualize the global flights and airports over the world map in animation. (And check here if you wanted the static plot first)