Use Dumbbell Plot with ggplot2 to Visualize Voting Demographics


In this article, we’ll create a dumbbell plot showing the percentage of votes won by Donald Trump and Hillary Clinton from different population groups at the 2016 United States presidential election. The data used in this visualization can be downloaded here. This work is a ggplot2 reproduction of the demo graphic by Datawrapper.

Major techniques explained in this article include:

  • Reorder axial variables
  • Make unique annotations to selected faceted panels (subplots)
  • Independent panel size adjustment
  • Aesthetic customization with ifelse()

Packages and data cleanup

library(ggplot2)library(dplyr)library(tidyr)library(stringr)
d <- read.csv("/Users/boyuan/Desktop/R/gallery/DATASETS/Trump_vs_Clinton.csv")

Reorder the x-axis labels. This is achieved by turning the variable that is mapped to the x aesthetic into a factor with specified level order. In this visualization, we specify the factor levels in the reverse order of their appearance in the input dataset. More details in graphic elements rearrangement can be found in this complete guide.

d.ordered <- d %>%  mutate(category = factor(category, levels = rev(d$category))) 

Tidy up the data, such that candidate and percentage each is a single variable (column). In addition, we create a new column is.max to indicate which candidate has the most votes in each voting group. This variable will help specify the y-axis position of the label of the voting percentage later in step 7.

d.tidy <- d.ordered %>%   # tidy up  pivot_longer(contains("Vote"),                names_to = "candidate", values_to = "percent") %>%   mutate(candidate = str_remove(candidate, "Vote.for.")) %>%     # mark the candidate with the most votes in each voting group  group_by(category) %>%   mutate(is.max = percent == max(percent))
head(d.tidy, n = 3)

Output:

# A tibble: 3 × 5
# Groups: category [2]
category group candidate percent is.max
<fct> <chr> <chr> <int> <lgl>
1 18-29 Age Clinton 58 TRUE
2 18-29 Age Trump 28 FALSE
3 30-49 Age Clinton 51 TRUE

Visualization

1. Create line segments and points. Note that the aesthetic mapping of of the candidate variable (aes(color = candidate)) is specified in its own associated layer geom_point (and geom_text in step 7), but not in the global aesthetic mapping in the ggplot() line. This is convenient so that when another geom_* has a new input dataset for the data = argument, this new dataset does not have to possess the candidate variable. (An alternative approach is to map candidate in the ggplot() line, and set inherit.aes = F in the new geom_* layer when its associated dataset does not have the candidate variable - consult our excellent ggplot2 e-book to learn more!)

p1 <- d.tidy %>%   ggplot(aes(x = category, y = percent)) +    # using the dataset before tidy up to create the segments  geom_segment(data = d.ordered,               aes(x = category, xend = category,                   y = Vote.for.Clinton, yend = Vote.for.Trump),                color = "snow3", linewidth = 2) +    # points created using the "global" d.tidy dataset  geom_point(aes(color = candidate), size = 3) +  scale_color_manual(values = c("Clinton" = "steelblue",                                 "Trump" = "firebrick3"))p1

2. Flip and facet the plot. Here we use clip = "off" to prevent graphic elements beyond the panel boundary from being clipped away (see step 3). facet_grid(variable ~ .) is a more flexible alternative to facet_wrap(~ variable). It has the unique argument space which allows different panel size when the number of variables are different across panels (e.g., the “Gender” panel has two groups, and is thinner in size than the other panels with four groups).

p2 <- p1 +   coord_flip(clip = "off") + # flip the plot    # facet the plot based on voting groups  facet_grid(group ~ ., scales = "free", space = "free_y")  p2

3. Add panel titles manually at the top left corner, in replace of the default ones. To add text annotations unique to each faceted panel (subplot), the key is to include in the input dataset the same variable group which has been used to facet the plot. This variable dictates which faceted panel the texts should be plotted in, and is also mapped to the label aesthetic. The same technique has also been used here, and here (with reordered faceting variable).

p3 <- p2 +   # use the same `group` variable which has been used for faceting  geom_text(data = tibble(group = c("Age", "Education", "Gender", "Race/Ethnicity")),            aes(x = c(5.2, 5.2, 3.2, 5.2),                 y = -20,                 label = group),             hjust = 0, fontface = "bold", size = 3.5) p3

Note that the panel titles are riding across the border of the faceted panels, and are displayed completely. If we hadn’t set up the clip = "off" in step 2, the part of texts beyond the panel boundary would be otherwise invisible (clipped away).

4. Manually add the x (left-side) axis labels in line with the panel titles added at the earlier step. As the data for this layer of geom_text inherits from d.tidy, which has two rows (two candidates) for each voting group, each axis label displayed is actually a perfect overlap of two labels.

p4 <- p3 +  geom_text(aes(y = -20, label = category),            hjust = 0, size = 3, color = "grey30")p4

5. Remove the default panel titles (strip_text), and the x-axis labels and ticks.

p5 <- p4 +   theme_minimal() +  theme(axis.text.y = element_blank(),        axis.title.y = element_blank(),        strip.text = element_blank(),                # adjust distance between subplots        panel.spacing.y = unit(15, "pt")) p5

6. Adjust the vertical panel grid (associated with the y-axis scale).

p6 <- p5 + scale_y_continuous(  breaks = seq(0, 100, 20),  minor_breaks = NULL) + # remove the minor vertical grids   theme(panel.grid.major.y = element_line(linewidth = .2))p6

7. Add text labels of the voting percentage. We use is.max variable to determine the y-axis position of the text label.

p7 <- p6 +   geom_text(    aes(label = paste(percent, "%"),         y = ifelse(is.max == T, percent + 8, percent - 8),        color = candidate),    show.legend = F) p7

8. Add text annotations in the top panel in replace of the default legend. Similar to step 3, we create a dataset with the faceting variable group to uniquely annotate the desired subplot. Also add the plot title.

p8 <- p7 +  # add text annotation in the top first panel  geom_text(    data = tibble(percent = c(12, 80),                   category = "18-29",                  text = c("vote for Trump", "vote for Clinton"),                  group = "Age"), # texts added only to the 1st panel "Age"    aes(x = category, y = percent, label = text),        color = c("firebrick3", "steelblue"),     fontface = "bold", size = 4,    vjust = -2) + # text shifted upward    # remove default legend  theme(legend.position = "none") +     # adjust the plot title  labs(title = "Voter Demographics in 2016\n")  +  theme(plot.title = element_text(face = "bold", hjust = .5, size = 15))
p8

As illustrated above, labeling the plot with “vote for Trump” and “vote for Clinton” is an effective and concise replacement to the traditional legends. Another powerful approach is to incorporate such notations in distinct colors in the plot title, as shown in this line segment - rectangle plot, this donut-lollipop plot, and this clustered bar plot.

library(ggplot2)library(dplyr)library(tidyr)library(stringr)
d <- read.csv("/Users/boyuan/Desktop/R/gallery/DATASETS/Trump_vs_Clinton.csv")
# Reorder axial labels. d.ordered <- d %>% mutate(category = factor(category, levels = rev(d$category)))
# Tidy up the data.d.tidy <- d.ordered %>% pivot_longer(contains("Vote"), names_to = "candidate", values_to = "percent") %>% mutate(candidate = str_remove(candidate, "Vote.for.")) %>% # mark the candidate with the most votes in each voting group group_by(category) %>% mutate(is.max = percent == max(percent))
head(d.tidy, n = 3)

# Create a dumbbell plot containing line segments and points. p1 <- d.tidy %>% ggplot(aes(x = category, y = percent)) + # using the dataset before tidy up to create the segments geom_segment(data = d.ordered, aes(x = category, xend = category, y = Vote.for.Clinton, yend = Vote.for.Trump), color = "snow3", linewidth = 2) + # points created using the "global" d.tidy dataset geom_point(aes(color = candidate), size = 3) + scale_color_manual(values = c("Clinton" = "steelblue", "Trump" = "firebrick3"))p1

# Flip and facet the plot. p2 <- p1 + coord_flip(clip = "off") + # flip the plot # facet the plot based on voting groups facet_grid(group ~ ., scales = "free", space = "free_y") p2

# Add panel titles manually at the top left corner, in replace of the default ones.p3 <- p2 + # use the same `group` variable name used for faceting geom_text(data = tibble(group = c("Age", "Education", "Gender", "Race/Ethnicity")), aes(x = c(5.2, 5.2, 3.2, 5.2), y = -20, label = group), hjust = 0, fontface = "bold", size = 3.5) p3

# Manually add the x (left-side) axis labels, in line with the newly added panel titles. p4 <- p3 + geom_text(aes(y = -20, label = category), hjust = 0, size = 3, color = "grey30")p4

# Remove the default panel titles, the x-axis labels, and ticks.p5 <- p4 + theme_minimal() + theme(axis.text.y = element_blank(), axis.title.y = element_blank(), strip.text = element_blank(), # adjust distance between subplots panel.spacing.y = unit(15, "pt")) p5

# Adjust the vertical panel grid (associated with the y-axis scale).p6 <- p5 + scale_y_continuous( breaks = seq(0, 100, 20), minor_breaks = NULL) + # remove the minor vertical grids theme(panel.grid.major.y = element_line(linewidth = .2))p6

# Add the voting percentage. p7 <- p6 + geom_text( aes(label = paste(percent, "%"), y = ifelse(is.max == T, percent + 8, percent - 8), color = candidate), show.legend = F) p7

# Add text annotations in the top panel in replace of the default legend. p8 <- p7 + geom_text( # add text annotation in the top first panel data = tibble(percent = c(12, 80), category = "18-29", text = c("vote for Trump", "vote for Clinton"), group = "Age"), # texts added only to the 1st panel "Age" aes(x = category, y = percent, label = text), color = c("firebrick3", "steelblue"), fontface = "bold", size = 4, vjust = -2) + # text shifted upward # remove default legend theme(legend.position = "none") + # adjust the plot title labs(title = "Voter Demographics in 2016\n") + theme(plot.title = element_text(face = "bold", hjust = .5, size = 15))
p8




Continue Exploring — 🚀 one level up!


The dumbbell plot based on geom_segment can be easily updated into arrows to display information about trend of changes. The following article leverages annotated arrows in faceted panels to depict the changes in the proportion of seats held by women in national parliaments from 2000 to 2020.



In addition to the classic dumbbell style illustrated above, the following visualization presents an “upgraded” version of a dumbbell plot: it effectively highlights the increase of life expectancy from 1952 to 2007, showing such changes at both the country-wise (each single line) and the continent-wise ( central thick line) level.