Draw Mosaic Plot in ggplot2 to Visualize the Population Composition at Different Wealth and Health Status
A mosaic plot is a graphical representation of the contingency table (frequency distribution of two or more categorical variables), and use the rectangle area to illustrate the proportion of cases falling into the different combination of category levels. In this work, we’ll create a mosaic plot to visualize the population composition at different wealth and health status in the United States. We’ll use the ggmosaic package by Haley et al.
From the happy dataset built in the ggmosaic package, we’ll select two columns, finrela, the relative financial condition, and health, the health condition. We’ll only retain rows without missing values.
# A tibble: 4 × 2 finrela health <fct> <fct> 1 average good 2 above average fair 3 average excellent 4 average good
Create a contingency table showing the head counts in each combination of wealth and health status. We’ll visualize this table as a mosaic plot at a following step.
h %>%table()
Output:
health finrela excellent good fair poor far below average 464 853 671 398 below average 2262 4735 2808 1044 average 6607 10473 4044 882 above average 3450 3669 873 158 far above average 403 292 102 41
Visualization
Create a mosaic plot visualizing the above contingency table. The area of each rectangle represents the proportion of population that falls into the associated wealth and health status.
p1 <- h %>%ggplot() +geom_mosaic(aes(x =product(finrela), fill = health)) +theme_mosaic()p1
Compared with the mosaic plot, which visualizes relative proportions by rectangle areas, the 2D histogram visualizes the proportion or counts using the color scale, and can be applied to both categorical variables (see this example) and continuous variables. For example, check out the following article on how to create a 2D histogram with a map overlay to visualize the hurricane activities in the North Atlantic region.