Create All Possible Combinations of Selected Variables (1/3): basics of expand() and nesting()

  • expand() creates all possible unique combinations between the levels in the selected Variables.
  • nesting() inside expand() finds combinations already present in the input dataset.

We’ll demonstrate the function using the following dataset. Note that in this example, the size variable is defined to have four distinct levels, XS, S, M and L, but only the first three levels are present in the dataset.

library(tidyr) library(dplyr)
fruits <- tibble( type = c("apple", "apple", "orange", "orange", "orange", "orange"), year = rep(c(2023, 2024), each = 3), size = factor( c("XS", "S", "S", "S", "S", "M"), levels = c("XS", "S", "M", "L") ), weights = rnorm(6, as.numeric(size) + 2))
fruits

Output:

# A tibble: 6 × 4
type year size weights
<chr> <dbl> <fct> <dbl>
1 apple 2023 XS 3.33
2 apple 2023 S 6.66
3 orange 2023 S 3.91
4 orange 2024 S 3.18
5 orange 2024 S 3.14
6 orange 2024 M 3.77

expand() creates a new dataset that shows all possible unique combinations of selected variables. For instance, expand(type, size) creates (2 types) x (4 sizes) = 8 combinations (rows), and expand(type, size, year) creates (2 types) x (4 sizes) x (2 years) = 16 combinations (rows). Note that for the factor variable, the full set of levels are included in the combination (e.g., including the missing level L of the size variable), not just those that appear in the data. If you want to use only the factor values seen in the input dataset, use fct_drop() from the forcats package to drop the missing factor levels.

#fruits %>%   expand(type, size)

Output:

# A tibble: 8 × 2
type size
<chr> <fct>
1 apple XS
2 apple S
3 apple M
4 apple L
5 orange XS
6 orange S
7 orange M
8 orange L
# drop missing factor level 'L' of 'size' variablefruits %>% expand(  type, size = forcats::fct_drop(size))

Output:

# A tibble: 6 × 2
type size
<chr> <fct>
1 apple XS
2 apple S
3 apple M
4 orange XS
5 orange S
6 orange M

You can use the helper function nesting() inside expand() to include only unique combinations that already appear in the input dataset. For instance, in the code below, the size level of L is not included; combinations between type of apple, size of M and L, and year of 2024 are not present in the input dataset fruits, and thus not included in the output.

fruits %>%   expand(nesting(type, size))

Output:

# A tibble: 4 × 2
type size
<chr> <fct>
1 apple XS
2 apple S
3 orange S
4 orange M
fruits %>%   expand(nesting(type, size, year))

Output:

# A tibble: 5 × 3
type size year
<chr> <fct> <dbl>
1 apple XS 2023
2 apple S 2023
3 orange S 2023
4 orange S 2024
5 orange M 2024

The code above is equivalent to selecting unique combinations using distinct(), with rows further sorted with arrange() (both functions from the dplyr package). For instance, the following two lines produce the same result.

fruits %>% expand(nesting(type, size))fruits %>% distinct(type, size) %>% arrange(type, size)

You can put together these two types of combinations: first expand with unique combinations already present in the dataset, and then expand further with additional variables including all possible combinations.

# find combinations of 'type' and 'size' already present in dataset# and then cross with 'year' including all possible combinationsfruits %>% expand(nesting(type, size), year)

Output:

# A tibble: 8 × 3
type size year
<chr> <fct> <dbl>
1 apple XS 2023
2 apple XS 2024
3 apple S 2023
4 apple S 2024
5 orange S 2023
6 orange S 2024
7 orange M 2023
8 orange M 2024

New variables can be supplied to create additional combinations.

# create combinations with a new variable 'store'fruits %>% expand(  nesting(type, size), store = c("Walmart", "Costco"))

Output:

# A tibble: 8 × 3
type size store
<chr> <fct> <chr>
1 apple XS Costco
2 apple XS Walmart
3 apple S Costco
4 apple S Walmart
5 orange S Costco
6 orange S Walmart
7 orange M Costco
8 orange M Walmart

Now you have been familiar with the basics of expand() and nesting(). In the next section, we’ll discuss how to use expand() in junction with some dplyr functions to create additional applications.