library(ggplot2)library(dplyr)library(tidyr)library(stringr) # for string manipulation
theme_set(theme_classic())
Create Elegant Population Pyramid in ggplot2

In this article, we’ll create a population pyramid displaying the German population structure in 2023. Major techniques covered in this article include:
- Modify the dataset structure to fit the graphic designs.
- Draw a population pyramid.
- Create text and rectangle annotation.
Packages and data cleanup
The raw data is sourced from the Federal Statistical Office of Germany. The dataset (a cleaned version) used in this visualization can be downloaded here.
<- read.csv( d "/Users/boyuan/Desktop/R/gallery/DATASETS/germany_population_2023.csv", # remove the "X" letters otherwise added to headers started with numbers check.names = F)
<- d %>% as_tibble() # convert to tibble format d
head(d, n = 3)
Output:
# A tibble: 3 × 3
age f m
<chr> <int> <int>
1 0_1 369 389
2 1_2 368 388
3 2_3 394 416
Transform the dataset into the tidy format: gender attributes (female f
and male m
) as one column, and the associated population as another column.
<- d %>% d.tidy pivot_longer(-age, names_to = "gender", values_to = "population") head(d.tidy, 4)
Output:
# A tibble: 4 × 3
age gender population
<chr> <chr> <int>
1 0_1 f 369
2 0_1 m 389
3 1_2 f 368
4 1_2 m 388
Modify the population
variable to create bars in a symmetrical layout:
The bars corresponding to male occupies the positive y-axis range, and bars of the female takes the negative y-axis range.
Increase the bar length by extra 80 units for each gender to offset the overlay of the central banner, which covers the bars of each gender by 80 units.
<- d.tidy %>% as_tibble() %>% d.tidy mutate(population = ifelse( == "f", - population - 80, population + 80 )) gender
Arrange the population bars by the age. While the dataset d.tidy
is already arranged in order of age, the age
variable is of a “character” type, and does not “memorize” such order when rendered in graphics. In order to retain such order, we need to convert the age
variable to a factor, with levels explicitly indicated as in the current row order of d.tidy
. Use unique
to extract distinct ages, as the levels
argument requires no duplication in the input values. (Learn all essential skills in reordering graphic elements in this complete tutorial.)
$age <- d.tidyfactor(d.tidy$age, levels = unique(d.tidy$age))
Highlight the population difference between genders. Here we create a new column pop.min
to record the minimum population value of the two genders in each age group. The minimum values will be visualized as a masking layer in lighter color, and thus highlight the excess population in one of the genders.
<- d.tidy %>% d.tidy group_by(age) %>% mutate(pop.min = min(abs(population)), # female in negative values, male in positive values pop.min = ifelse(gender == "m", pop.min, -pop.min))
head(d.tidy, n = 4) # ready for visualization
Output:
# A tibble: 4 × 4
# Groups: age [2]
age gender population pop.min
<fct> <chr> <dbl> <dbl>
1 0_1 f -449 -449
2 0_1 m 469 449
3 1_2 f -448 -448
4 1_2 m 468 448
Visualization
Create bar plots with flipped axes. As the population value of males are positive, and females are negative, this creates bars in a symmetrical layout as mirror images (thanks to this step of data processing).
<- d.tidy %>% p1 ggplot(aes(x = age, y = population, fill = gender)) + geom_col(width = 1) + coord_flip(clip = "off") p1
Highlight population difference between genders. The regions more saturated in color signify the population excess in one gender compared to the other. This effect is achieved by generating a masking layer of bars with semi-transparent white color, with bar length equal to the minimum population of the genders at each age (thanks to this step of data processing).
<- p1 + p2 geom_col(aes(y = pop.min), fill = "white", alpha = .5, width = 1, position = "identity") p2
Create a central banner overlay on the bars. This banner will contain the age labels, in place of the original x-axis (left vertical). Centered at y = 0, the width of the banner spans a total of 160 units, covering 80 units of bar length of each gender. To offset the coverage of the central banner to the underlying bars, we had included an additional 80 units to the population values at this early step of data processing.
<- p2 + p3 annotate( geom = "rect", # create rectangles xmin = -Inf, xmax = Inf, ymin = 80, ymax = -80, fill = "floralwhite") p3
Create age labels in the central banner. We’ll remove the x-axis labels later.
<- p3 + p4 annotate( geom = "text", x = seq(0, 100, 5), y = 0, label = seq(0, 100, 5), size = 3, fontface = "bold") p4
Revise the y-axis (bottom horizontal), and add plot titles.
To correct for the coverage of the central banner, the y-axis for male and female data should be shifted towards the right and left by an additional 80 units, respectively. Therefore, here we calculate the updated y-axis breaks, and label the axis with positive population values.
Use white spaces in the y-axis title to symmetrically split the title into two pieces, each half for each of the genders.
<- seq(0, 700, by = 100) breaks <- c(breaks + 80, - breaks - 80) breaks.updated
<- p4 + p5 scale_y_continuous( breaks = breaks.updated, labels = function(x) {abs(x) - 80 }) + labs( y = "Female (in thousands) Male (in thousands) ", title = "Population Pyramid in Germany 2023", caption = "data source: https://service.destatis.de/bevoelkerungspyramide/") p5
Add Google font, and polish up the theme.
library(showtext)showtext_auto()font_add_google(name = "Abril Fatface", family = "fatface")
<- p5 + p6 theme_void() + # an empty canvas, with removal of all axial elements theme( axis.text.x = element_text(size = 10, color = "snow4", margin = margin(t = 5)), axis.title.x = element_text(face = "bold", margin = margin(t = 5, b = 5)), plot.title = element_text(hjust = .5, size = 15, face = "bold", margin = margin(b = 10), family = "fatface"), plot.caption = element_text(hjust = .9, size = 10, color = "snow4"), plot.margin = margin(t = 10, b = 10), # increase plot top and bottom margin legend.position = "none")
p6
### Packages and data importlibrary(ggplot2)library(dplyr)library(tidyr)library(stringr) # for string manipulationlibrary(gganimate) # create animation
theme_set(theme_classic())
<- read.csv( d "/Users/boyuan/Desktop/R/gallery/DATASETS/germany_population_2023.csv", # remove the "X" letters otherwise added to headers started with numbers check.names = F )
<- d %>% as_tibble() # convert to tibble format d
head(d, n = 3)
# Transform the dataset into a tidy format<- d %>% d.tidy pivot_longer(-age, names_to = "gender", values_to = "population") head(d.tidy, 4)
# Put the male data at the positive range of y axis, and female data at negative range. # And meanwhile increase the bar length by extra 80 units # to offset the overlay of the central banner which covers the bars of each gender by 80 units.<- d.tidy %>% as_tibble() %>% d.tidy mutate(population = ifelse( == "f", - population - 80, population + 80 )) gender
# Display the population pyramid in order of age.$age <- d.tidyfactor(d.tidy$age, levels = unique(d.tidy$age))
# Highlight the population difference between genders.<- d.tidy %>% d.tidy group_by(age) %>% mutate(pop.min = min(abs(population)), # female in negative values, male in positive values pop.min = ifelse(gender == "m", pop.min, -pop.min))
head(d.tidy, n = 4) # ready for visualization
### Visualization ----------------------------------------
# Create bar plots with flipped axes.<- d.tidy %>% p1 ggplot(aes(x = age, y = population, fill = gender)) + geom_col(width = 1) + coord_flip(clip = "off") p1
# Highlight population difference between genders.<- p1 + p2 geom_col(aes(y = pop.min), fill = "white", alpha = .5, width = 1, position = "identity") p2
# Create a central banner overlay on the bars.<- p2 + p3 annotate( geom = "rect", # create rectangles xmin = -Inf, xmax = Inf, ymin = 80, ymax = -80, fill = "floralwhite") p3
# Label with ages in the central banner.<- p3 + p4 annotate( geom = "text", x = seq(0, 100, 5), y = 0, label = seq(0, 100, 5), size = 3, fontface = "bold") p4
# Revise the y axis (horizontal), and add plot titles<- seq(0, 700, by = 100) breaks <- c(breaks + 80, - breaks - 80) breaks.updated
<- p4 + p5 scale_y_continuous( breaks = breaks.updated, labels = function(x) {abs(x) - 80 }) + labs( y = "Female (in thousands) Male (in thousands) ", title = "Population Pyramid in Germany 2023", caption = "data source: https://service.destatis.de/bevoelkerungspyramide/") p5
# Add Google font, and polish up the theme. library(showtext)showtext_auto()font_add_google(name = "Abril Fatface", family = "fatface")
<- p5 + p6 theme_void() + # an empty canvas, with removal of all axial elements theme( axis.text.x = element_text(size = 10, color = "snow4", margin = margin(t = 5)), axis.title.x = element_text(face = "bold", margin = margin(t = 5, b = 5)), plot.title = element_text(hjust = .5, size = 15, face = "bold", margin = margin(b = 10), family = "fatface"), plot.caption = element_text(hjust = .9, size = 10, color = "snow4"), plot.margin = margin(t = 10, b = 10), # increase plot top and bottom margin legend.position = "none")
p6
Continue Exploring — 🚀 one level up!
We can slightly tweak the script to visualize the annual population pyramids in animation using the popular gganimate
package. Check it out!
The barplot is the basis of pie and donut charts, which are essentially bars presented in a polar coordinate. Check the following exploded donut plots in faceted layout that visualizes the top four GDP contributors in each continent.