Draw Mosaic Plot in ggplot2 to Visualize the Population Composition at Different Wealth and Health Status

A mosaic plot is a graphical representation of the contingency table (frequency distribution of two or more categorical variables), and use the rectangle area to illustrate the proportion of cases falling into the different combination of category levels. In this work, we’ll create a mosaic plot to visualize the population composition at different wealth and health status in the United States. We’ll use the ggmosaic package by Haley et al.


Packages and data cleanup

From the happy dataset built in the ggmosaic package, we’ll select two columns, finrela, the relative financial condition, and health, the health condition. We’ll only retain rows without missing values.

# install.packages("ggmosaic")library(ggmosaic)library(ggplot2)library(dplyr)
h <- happy %>% select(finrela, health) %>% as_tibble()h <- h[complete.cases(h), ] head(h, 4)

Output:

# A tibble: 4 × 2
finrela health
<fct> <fct>
1 average good
2 above average fair
3 average excellent
4 average good

Create a contingency table showing the head counts in each combination of wealth and health status. We’ll visualize this table as a mosaic plot at a following step.

h %>% table()

Output:

health
finrela excellent good fair poor
far below average 464 853 671 398
below average 2262 4735 2808 1044
average 6607 10473 4044 882
above average 3450 3669 873 158
far above average 403 292 102 41

Visualization

Create a mosaic plot visualizing the above contingency table. The area of each rectangle represents the proportion of population that falls into the associated wealth and health status.

p1 <- h %>%   ggplot() +  geom_mosaic(aes(x = product(finrela), fill = health)) +  theme_mosaic()p1

Polish up the plot.

p2 <- p1 +  coord_cartesian(expand = 0) +  scale_fill_manual(values = c("turquoise3", "steelblue1", "orange", "tomato2")) +  labs(x = "Financial Status", y = "Health Status") +  theme(    axis.text = element_text(size = 10),    axis.text.x = element_text(angle = 40, hjust = 1, size = 10),    axis.title = element_text(size = 14, face = "bold", color = "grey40"),    legend.position = "none") 
p2
# install.packages("ggmosaic")library(ggmosaic)library(ggplot2)library(dplyr)
h <- happy %>% select(finrela, health) %>% as_tibble()h <- h[complete.cases(h), ] head(h, 4)
# Create a contingency table # showing the head counts in each combination of wealth and health status. h %>% table()

### Visualization# Create a mosaic plot visualizing the above contingency table. p1 <- h %>% ggplot() + geom_mosaic(aes(x = product(finrela), fill = health)) + theme_mosaic()p1
p2 <- p1 + coord_cartesian(expand = 0) + scale_fill_manual(values = c("turquoise3", "steelblue1", "orange", "tomato2")) + labs(x = "Financial Status", y = "Health Status") + theme( axis.text = element_text(size = 10), axis.text.x = element_text(angle = 40, hjust = 1, size = 10), axis.title = element_text(size = 14, face = "bold", color = "grey40"), legend.position = "none")
p2




Continue Exploring — 🚀 one level up!


The mosaic plot is a powerful tool to visualize the bivariate distribution, or a contingency table of two categories. But how about the “contingency table” of a single variable? Well, check out the article below and learn how to create a Waffle plot to visualize the population distribution at different wealth status in the United States.



Compared with the mosaic plot, which visualizes relative proportions by rectangle areas, the 2D histogram visualizes the proportion or counts using the color scale, and can be applied to both categorical variables (see this example) and continuous variables. For example, check out the following article on how to create a 2D histogram with a map overlay to visualize the hurricane activities in the North Atlantic region.