Draw Scatterplot with Logarithmic Scales in ggplot2 to Visualize Diamonds’ Price Relation with Size

In this article, we’ll work on the diamonds dataset (built in ggplot2), and visualize the relationship between the diamonds’ price and carat size on logarithmic scales.

Major techniques explained in this article include:


library(ggplot2)library(dplyr)theme_set(theme_minimal())
head(diamonds, 3)

Output:

# A tibble: 3 × 10
carat cut color clarity depth table price x y z
<dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31

1. Create a simple scatterplot. The “color” variable is mapped to both aesthetics of color and fill with the same viridis color palette. As the points are created in the shape of “.”, only the aesthetic mapping color = color is effective (the two “colors” respectively referring to the aesthetic and the variable name), while fill = color is not useful to adjust the scatterplot colors at this step. Instead, the fill aesthetic is created to prepare for the customization of the legend keys (see step 2).

p1 <- diamonds %>%   ggplot(aes(x = carat, y = price, color = color, fill = color)) +   geom_point(alpha = .5, shape = ".") +  # color scale  scale_fill_viridis_d(option = "A", direction = -1) +  scale_color_viridis_d(option = "A", direction = -1) +  theme(legend.position = c(.85, .4)) p1

2. Enhance the visibility of legend keys. By default, the legend keys inherit aesthetic properties from the associated geom_*, and in this example are displayed as small semi-transparent dots. To make the keys visually more prominent, here we override the inheritance of specific properties by using guides() and override.aes = list() functions. The point shape 21 creates a circular outline and a filled interior in the legend (learn more basics of points). Thanks to the earlier mapping of fill and scale_fill_viridis_d (review step 1), the fill in the legend keys remain mapped to the “color” variable, and are thus consistent with the main plot.

p2 <- p1 +  guides(fill = guide_legend(    override.aes = list(      size = 5, alpha = 1, color = "black", shape = 21)))p2

3. Transform the x and y axes to logarithmic scale of base 2. (check out this example on semi-log scale on base 10)

p3 <- p2 +  # log 2 transformation  scale_x_continuous(trans = "log2") +  # breaks are original value before log transformation  scale_y_continuous(trans = "log2", breaks = 2^c(9:14)) +    annotation_logticks(base = 2) p3

4. Calculate the simple linear regression for each color grade of the diamond, showing the relationship between log2(price) and log2(carat).

p4 <- p3 +  # regression calculated upon data after transformation  geom_smooth(    method = "lm", se = F, linewidth = .5, show.legend = F) +   # zoom in  # the limit are original values before the log transformation  coord_cartesian(xlim = c(.25, 3), ylim = c(400, 2^14))p4

5. Visualize the price distribution for each color grade of the diamond using the ggExtra package.

# install.packages("ggExtra")library(ggExtra)ggMarginal(p4, margins = "y", groupColour = T) 
library(ggplot2)library(dplyr)theme_set(theme_minimal())
head(diamonds, 3)
# Create a basic scatter plot. p1 <- diamonds %>% ggplot(aes(x = carat, y = price, color = color, fill = color)) + geom_point(alpha = .5, shape = ".") + # color scale scale_fill_viridis_d (option = "A", direction = -1) + scale_color_viridis_d(option = "A", direction = -1) + theme(legend.position = c(.85, .4)) p1

# Enhance the visibility of legend keys. p2 <- p1 + guides(fill = guide_legend( override.aes = list( size = 5, alpha = 1, color = "black", shape = 21)))p2

# Transform the x and y axes to logarithmic scale of base 2. p3 <- p2 + # log 2 transformation scale_x_continuous(trans = "log2") + # breaks are original value before log transformation scale_y_continuous(trans = "log2", breaks = 2^c(9:14)) + annotation_logticks(base = 2) p3

# Calculate a simple linear regressionp4 <- p3 + # regression calculated upon data after transformation geom_smooth( method = "lm", se = F, linewidth = .5, show.legend = F) + # zoom in # the limit are original values before the log transformation coord_cartesian(xlim = c(.25, 3), ylim = c(400, 2^14))p4

# Visualize the marginal distribution of price# install.packages("ggExtra")library(ggExtra)ggMarginal(p4, margins = "y", groupColour = T)




Continue Exploring — 🚀 one level up!


A scatterplot is often enhanced by visualizing the marginal (univariate) distribution of the x and y variables, and the bivariate distribution pattern with confidence ellipses. Check out the following scatterplot with marginal and ellipses visualization.



For data with high skewness, mathematical transformations are a powerful tool to aid in visualizing the underlying data structure, as shown in the diamonds’ scatterplot. In addition, math transformations are also commonly leveraged in the color scale. Check out the following awesome heatmap on African population density, with pseudo-logarithmic transformation in the color scale to unveil highly skewed data pattern.