Spread Columns into Wider Dataset (3/3): Generate new columns from multiple variables

In this section, we will discuss how to widen datasets by generating new columns from multiple variables in pivot_wider(). We’ll demonstrate the variable combinations in names_from and values_from respectively using the following two examples.

e.g.1.

library(tidyr)library(dplyr)
production <- expand_grid(  product = c("A", "B"),   country = c("UK", "USA"),   year = 2023:2025) %>%  filter((product == "A" & country == "UK") | product == "B") %>%   mutate(production = rnorm(nrow(.)))
production

Output:

# A tibble: 9 × 4 product country year production <chr> <chr> <int> <dbl> 1 A UK 2023 -0.237 2 A UK 2024 -0.971 3 A UK 2025 -1.53 4 B UK 2023 0.153 5 B UK 2024 0.599 6 B UK 2025 -2.08 7 B USA 2023 1.14 8 B USA 2024 -2.01 9 B USA 2025 -0.913

We want to widen the data so we have one column for each combination of product and country. The key is to specify multiple variables in names_from.

production %>% pivot_wider(  # create new column names as combination of "product" and "country"  names_from = c(product, country),   values_from = production)

Output:
# A tibble: 3 × 4
   year   A_UK   B_UK  B_USA
  <int>  <dbl>  <dbl>  <dbl>
1  2023 -0.237  0.153  1.14
2  2024 -0.971  0.599 -2.01
3  2025 -1.53  -2.08  -0.913

When either names_from or values_from select multiple variables, you can control how the column names in the output are constructed with arguments names_sep and names_prefix.

production %>% pivot_wider(  names_from = c(product, country),   values_from = production,  names_sep = ".", # separate parts of the column name with a dot  names_prefix = "prod." # add prefix "prod.")

Output:
# A tibble: 3 × 4
   year prod.A.UK prod.B.UK prod.B.USA
  <int>     <dbl>     <dbl>      <dbl>
1  2023    -0.237     0.153      1.14
2  2024    -0.971     0.599     -2.01
3  2025    -1.53     -2.08      -0.913

Alternatively, you can construct new column names using the argument names_glue. Here prod_{product}_{country} combines prefix “prod” with unique levels of the product and country variables, connected with underscores.

production %>% pivot_wider(  names_from = c(product, country),   values_from = production,  names_glue = "prod_{product}_{country}" # specify new column name pattern)

Output:
# A tibble: 3 × 4
   year prod_A_UK prod_B_UK prod_B_USA
  <int>     <dbl>     <dbl>      <dbl>
1  2023    -0.237     0.153      1.14
2  2024    -0.971     0.599     -2.01
3  2025    -1.53     -2.08      -0.913

e.g.2. The us_rent_income dataset contains information about median income and rent for each state in the US for 2017.

us_rent_income

Output:

# A tibble: 104 × 5 GEOID NAME variable estimate moe <chr> <chr> <chr> <dbl> <dbl> 1 01 Alabama income 24476 136 2 01 Alabama rent 747 3 3 02 Alaska income 32940 508 4 02 Alaska rent 1200 13 5 04 Arizona income 27517 148 6 04 Arizona rent 972 4 7 05 Arkansas income 23789 165 8 05 Arkansas rent 709 5 9 06 California income 29454 109 10 06 California rent 1358 3 # ℹ 94 more rows

Here both estimate and moe (90% margin of error) are value columns, so we can supply them to values_from.

us_rent_income %>%   pivot_wider(    names_from = variable,     values_from = c(estimate, moe))

Output:

# A tibble: 52 × 6 GEOID NAME estimate_income estimate_rent moe_income moe_rent <chr> <chr> <dbl> <dbl> <dbl> <dbl> 1 01 Alabama 24476 747 136 3 2 02 Alaska 32940 1200 508 13 3 04 Arizona 27517 972 148 4 4 05 Arkansas 23789 709 165 5 5 06 California 29454 1358 109 3 6 08 Colorado 32401 1125 109 5 7 09 Connecticut 35326 1123 195 5 8 10 Delaware 31560 1076 247 10 9 11 District of Columbia 43198 1424 681 17 10 12 Florida 25952 1077 70 3 # ℹ 42 more rows

Amazing eBook to learn ggplot2 FAST & EASY