Spread Columns into Wider Dataset (3/3): Generate new columns from multiple variables

In this section, we will discuss how to widen datasets by generating new columns from multiple variables in pivot_wider(). We’ll demonstrate the variable combinations in names_from and values_from respectively using the following two examples.

e.g.1.

library(tidyr)library(dplyr)
production <- expand_grid( product = c("A", "B"), country = c("UK", "USA"), year = 2023:2025) %>% filter((product == "A" & country == "UK") | product == "B") %>% mutate(production = rnorm(nrow(.)))
production

Output:

# A tibble: 9 × 4
product country year production
<chr> <chr> <int> <dbl>
1 A UK 2023 -0.237
2 A UK 2024 -0.971
3 A UK 2025 -1.53
4 B UK 2023 0.153
5 B UK 2024 0.599
6 B UK 2025 -2.08
7 B USA 2023 1.14
8 B USA 2024 -2.01
9 B USA 2025 -0.913

We want to widen the data so we have one column for each combination of product and country. The key is to specify multiple variables in names_from.

production %>% pivot_wider(  # create new column names as combination of "product" and "country"  names_from = c(product, country),   values_from = production)

Output:

# A tibble: 3 × 4
year A_UK B_UK B_USA
<int> <dbl> <dbl> <dbl>
1 2023 -0.237 0.153 1.14
2 2024 -0.971 0.599 -2.01
3 2025 -1.53 -2.08 -0.913

When either names_from or values_from select multiple variables, you can control how the column names in the output are constructed with arguments names_sep and names_prefix.

production %>% pivot_wider(  names_from = c(product, country),   values_from = production,  names_sep = ".", # separate parts of the column name with a dot  names_prefix = "prod." # add prefix "prod.")

Output:

# A tibble: 3 × 4
year prod.A.UK prod.B.UK prod.B.USA
<int> <dbl> <dbl> <dbl>
1 2023 -0.237 0.153 1.14
2 2024 -0.971 0.599 -2.01
3 2025 -1.53 -2.08 -0.913

Alternatively, you can construct new column names using the argument names_glue. Here prod_{product}_{country} combines prefix “prod” with unique levels of the product and country variables, connected with underscores.

production %>% pivot_wider(  names_from = c(product, country),   values_from = production,  names_glue = "prod_{product}_{country}" # specify new column name pattern)

Output:

# A tibble: 3 × 4
year prod_A_UK prod_B_UK prod_B_USA
<int> <dbl> <dbl> <dbl>
1 2023 -0.237 0.153 1.14
2 2024 -0.971 0.599 -2.01
3 2025 -1.53 -2.08 -0.913


e.g.2. The us_rent_income dataset contains information about median income and rent for each state in the US for 2017.

us_rent_income

Output:

# A tibble: 104 × 5
GEOID NAME variable estimate moe
<chr> <chr> <chr> <dbl> <dbl>
1 01 Alabama income 24476 136
2 01 Alabama rent 747 3
3 02 Alaska income 32940 508
4 02 Alaska rent 1200 13
5 04 Arizona income 27517 148
6 04 Arizona rent 972 4
7 05 Arkansas income 23789 165
8 05 Arkansas rent 709 5
9 06 California income 29454 109
10 06 California rent 1358 3
# ℹ 94 more rows

Here both estimate and moe (90% margin of error) are value columns, so we can supply them to values_from.

us_rent_income %>%   pivot_wider(    names_from = variable,     values_from = c(estimate, moe))

Output:

# A tibble: 52 × 6
GEOID NAME estimate_income estimate_rent moe_income moe_rent
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 01 Alabama 24476 747 136 3
2 02 Alaska 32940 1200 508 13
3 04 Arizona 27517 972 148 4
4 05 Arkansas 23789 709 165 5
5 06 California 29454 1358 109 3
6 08 Colorado 32401 1125 109 5
7 09 Connecticut 35326 1123 195 5
8 10 Delaware 31560 1076 247 10
9 11 District of Columbia 43198 1424 681 17
10 12 Florida 25952 1077 70 3
# ℹ 42 more rows