library(ggplot2)library(dplyr)# install.packages("palmerpenguins")library(palmerpenguins) # data package # set as default themetheme_set(theme_minimal(base_size =16)) head(penguins, 4)
Output:
# A tibble: 4 × 8 species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g <fct> <fct> <dbl> <dbl> <int> <int> 1 Adelie Torgersen 39.1 18.7 181 3750 2 Adelie Torgersen 39.5 17.4 186 3800 3 Adelie Torgersen 40.3 18 195 3250 4 Adelie Torgersen NA NA NA NA # ℹ 2 more variables: sex <fct>, year <int>
Create a scatterplot, showing the relationship between the length and depth of a penguin’s bill.
p1 <- penguins %>%ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +# add small amount of random noise to points position to reduce overlapgeom_point(position =position_jitter(width = .2, height =0.2, seed =1)) +# color scale for categorical variablescale_color_brewer(palette ="Set2") p1
Draw confidence ellipse that encircles 95% of data points.
Annotate the scatterplot with the species’ names in place of the legend. We’ll remove the legend in the next step.
species <-tibble(x =c(34, 56.5, 56),y =c(20, 16.5, 19),species =c("Adelie", "Gentoo", "Chinstrap")) p3 <- p2 +geom_text(data = species, aes(x = x, y = y, label = species, color = species),# not inherit aesthetic mapping from the ggplot lineinherit.aes = F, fontface ="bold.italic", size =6)p3
Polish up more details as commented below.
p4 <- p3 +# rename axes' titleslabs(x ="bill length (mm)", y ="bill depth (mm)") +# adjust axis breaks scale_x_continuous(breaks =seq(from =20, to =60, by =4)) +scale_y_continuous(breaks =seq(from =10, to =25, by =2)) +theme(# remove legendlegend.position ="none",# remove the minor grids, and adjust the width of the major gridspanel.grid.minor =element_blank(),panel.grid.major =element_line(linewidth = .3),# bold axis titlesaxis.title =element_text(face ="bold"),# increase the margin on the right side of y-axis title, and on top of x-axis titleaxis.title.y =element_text(margin =margin(r =10)),axis.title.x =element_text(margin =margin(t =10))) +# center justify the legend titleguides(color =guide_legend(title.hjust = .5)) p4
Visualize the marginal distribution. geom_rug() creates barcode like images at the edge of the plot, and intuitively depict the data distribution.
p5 <- p4 +geom_rug(# apply the same "jitter" position to align with pointsposition =position_jitter(width = .2, height =0.2, seed =1), sides ="tr", # t, top; r, right; b, bottom; l, left length =unit(10, "pt"), # shorter bar line length linewidth = .1) # smaller width to reduce overlap p5
Further enhance the marginal distribution visualization with density plots using the ggExtra package. This package also provides an interactive dashboard approach to customize the plots. The dashboard can be generated by calling ggMarginalGadget(p5).
library(ggplot2)library(dplyr)# install.packages("palmerpenguins")library(palmerpenguins) # data package # set as default themetheme_set(theme_minimal(base_size =16)) # Create a basic scatterplotp1 <- penguins %>%ggplot(aes(x = bill_length_mm, y = bill_depth_mm, color = species)) +# add a small amount of random noise to points position to reduce overlapgeom_point(position =position_jitter(width = .2, height =0.2, seed =1)) +# color scale for categorical variablescale_color_brewer(palette ="Set2") p1 # Add confidence ellipse that encircles 95% of data points. p2 <- p1 +stat_ellipse(level = .95, show.legend = F, linewidth = .7) p2 # Directly annotate the scatterplot with species names in replace of the legend.species <-tibble(x =c(34, 56.5, 56),y =c(20, 16.5, 19),species =c("Adelie", "Gentoo", "Chinstrap")) p3 <- p2 +geom_text(data = species, aes(x = x, y = y, label = species, color = species),# not inherit aesthetic mapping from the ggplot lineinherit.aes = F, fontface ="bold.italic", size =6)p3 # Polish up more detailsp4 <- p3 +# rename axis titleslabs(x ="bill length (mm)", y ="bill depth (mm)") +# adjust axis breaks scale_x_continuous(breaks =seq(from =20, to =60, by =4)) +scale_y_continuous(breaks =seq(from =10, to =25, by =2)) +theme(# remove legendlegend.position ="none",# remove the minor grids, and adjust the width of the major gridspanel.grid.minor =element_blank(),panel.grid.major =element_line(linewidth = .3),# bold axis titlesaxis.title =element_text(face ="bold"),# increase the margin on the right side of y-axis title, and on top of x-axis titleaxis.title.y =element_text(margin =margin(r =10)),axis.title.x =element_text(margin =margin(t =10))) +# center justify the legend titleguides(color =guide_legend(title.hjust = .5)) p4 # Visualize the marginal distribution. p5 <- p4 +geom_rug(# apply the same "jitter" position to align with pointsposition =position_jitter(width = .2, height =0.2, seed =1), sides ="tr", # t, top; r, right; b, bottom; l, left length =unit(10, "pt"), # shorter bar line length linewidth = .1) # smaller width to reduce overlap p5 # Enhance the marginal distribution visualization with density plots# install.packages("ggExtra")library(ggExtra)ggMarginal(p5, groupFill = T, size =10, linewidth =0, alpha = .4)
Continue Exploring — 🚀 one level up!
A scatterplot is often drawn on a semi-logarithmic or double-log scale when there is significant data skewness in one or two axes. Check out the following scatterplot on semi-logarithmic scale that unveils linear relationship between the percentage of urbanization and log(GDP per capita).
In addition to the scatterplot, line plots are another most used tool in data visualization. Check out the following annotated line plot that shows the evolving popularity of smoking worldwide, especially in the highlighted countries of the United States, Germany, and France.