Use Waffle Plot in ggplot2 to Visualize U.S. Wealth Distribution

A waffle plot, also known as a square pie chart, or grid plot, resembles a grid of squares, where each square represents a certain portion of the whole dataset. It is a vivid approach to display data distribution. In this article, we’ll create a waffle plot to visualize the wealth distribution in the U.S. population.

packages and data cleanup

library(ggplot2)library(dplyr)library(ggmosaic) # use the "happy" dataset of this packagelibrary(stringr) # for string manipulation

Calculate the number of people in each wealth category.

h <- happy %>% as_tibble() %>% select(finrela)
# remove rows containing NA valuesh <- h[complete.cases(h), ]
# summarize the number of people in each wealth categorycounts <- h$finrela %>% table()

Calculate the head counts for each wealth category normalized to a total of 100 people.

nrows <- 10counts.normalized <- round( counts * (nrows^2 / sum(counts)) )
# check to make sure the sum of counts after rounding is 100 sum(counts.normalized)

Output:

[1] 100

Create a tidy dataset containing 100 people’s wealth condition. In the expand.grid function, we can also swap the x and y position, i.e., using expand.grid(x = 1:nrows, y = 1:nrows). This would otherwise create a transposed Waffle plot.

d <- expand.grid(y = 1:nrows, x = 1:nrows)d <- d %>%   mutate(wealth = rep(names(counts.normalized), counts.normalized))
head(d, n = 5)

Output:

y x wealth
1 1 1 far below average
2 2 1 far below average
3 3 1 far below average
4 4 1 far below average
5 5 1 far below average

Visualization

Create a heatmap representing the financial status of each person in the 10 x 10 grid.

# plotd %>% as_tibble() %>%   ggplot(aes(x = x , y = y, fill = wealth)) +  geom_tile(color = "white", linewidth = .5) +  scale_fill_brewer(palette = "Pastel1") +  ggtitle("Wealth distribution in US") +  theme_void() +  theme( plot.title = element_text(hjust = .5, face = "bold")) 

In addition to geom_tile, geom_raster is a high-performance alternative to generate heatmaps when each cell is of the same size, but does not have the color argument, and does not draw cell outlines.

library(ggplot2)library(dplyr)library(ggmosaic) # use the "happy" dataset of this packagelibrary(stringr) # for string manipulation

# Calculate the number of people in each wealth category. h <- happy %>% as_tibble() %>% select(finrela)
# remove rows containing NA valuesh <- h[complete.cases(h), ]
# summarize the number of people in each wealth categorycounts <- h$finrela %>% table()
# Calculate the head counts for each wealth category normalized to a total of 100 people.nrows <- 10counts.normalized <- round( counts * (nrows^2 / sum(counts)) )
# check to make sure the sum of counts after rounding is 100 sum(counts.normalized)
# Create a tidy dataset containing 100 people's wealth condition.d <- expand.grid(y = 1:nrows, x = 1:nrows)d <- d %>% mutate(wealth = rep(names(counts.normalized), counts.normalized))
# Create a heatmap representing the financial status of each person in the 10 x 10 grid. d %>% as_tibble() %>% ggplot(aes(x = x , y = y, fill = wealth)) + geom_tile(color = "white", linewidth = .5) + scale_fill_brewer(palette = "Pastel1") + ggtitle("Wealth distribution in US") + theme_void() + theme( plot.title = element_text(hjust = .5, face = "bold"))


Continue Exploring — 🚀 one level up!


Now that you know how to create a Waffle grid, check out the following article on how to create Emoji faces in a Waffle grid using the ggChernoff package.



The above Waffle plots showed the univariate distribution of wealth. As one step further, we can use the mosaic plot to visualize the bivariate distribution of two variables, such as wealth and health. Check out the following article below on how to create a mosaic plot to visualize the population composition at varied levels of wealth and health status.