Pack Columns into a Dataframe-Column, and Unpack a Dataframe-Column into Separate Columns

pack() mimics the nested column headers popularly used in Excel, and clusters multiple columns into a single dataframe-column. unpack() is the reverse procedure, and turns the single packed data-frame column into separate columns.

pack() is barely used in practice, while unpack() is generally a more useful function, as it turns complicated nested structure into simple one. We’ll first discuss the basics of the two functions, and then demonstrate the use of unpack() in the context of a capture group.

This tutorial covers the following content:

pack(): pack multiple columns into a single dataframe-column
unpack(): unpack a dataframe-column into separate columns
Demo: use unpack() with capture groups

Pack multiple columns into a single dataframe-column

In this example, the columns Sepal.Length and Sepal.Width are packed into a single dataframe-column Sepal, and Petal.Length and Petal.Width are packed into Petal. Selection helpers, e.g., starts_with(), can be used to select a range of columns to be packed up.

library(tidyr)library(dplyr)
iris %>% as_tibble() %>%   pack(Sepal = starts_with("Sepal"),       Petal = starts_with("Petal"))

Output:

# A tibble: 150 × 3 Species Sepal$Sepal.Length $Sepal.Width Petal$Petal.Length $Petal.Width <fct> <dbl> <dbl> <dbl> <dbl> 1 setosa 5.1 3.5 1.4 0.2 2 setosa 4.9 3 1.4 0.2 3 setosa 4.7 3.2 1.3 0.2 4 setosa 4.6 3.1 1.5 0.2 5 setosa 5 3.6 1.4 0.2 # ℹ 145 more rows

To simplify columns names, you can use .names_sep to strip off the common prefix before the specified separator in column names.

a <- iris %>% as_tibble() %>%   pack(Sepal = starts_with("Sepal"),       Petal = starts_with("Petal"),       .names_sep = ".")a

Output:
# A tibble: 150 × 3
  Species Sepal$Length $Width Petal$Length $Width
  <fct>          <dbl>  <dbl>        <dbl>  <dbl>
1 setosa           5.1    3.5          1.4    0.2
2 setosa           4.9    3            1.4    0.2
3 setosa           4.7    3.2          1.3    0.2
4 setosa           4.6    3.1          1.5    0.2
5 setosa           5      3.6          1.4    0.2
# ℹ 145 more rows

You can extract Sepal in two ways with a slightly different effect:

# return two unpacked columnsa$Sepal

Output:
# A tibble: 150 × 2
  Length Width
   <dbl> <dbl>
1    5.1   3.5
2    4.9   3
3    4.7   3.2
4    4.6   3.1
5    5     3.6
# ℹ 145 more rows

# retain dataframe-column formata %>% select(Sepal)

Output:

# A tibble: 150 × 1 Sepal$Length $Width <dbl> <dbl> 1 5.1 3.5 2 4.9 3 3 4.7 3.2 4 4.6 3.1 5 5 3.6 # ℹ 145 more rows

Unpack the dataframe-column

Use unpack() to release the clustered dataframe-column into separate columns.

a %>% unpack(Sepal)

Output:
# A tibble: 150 × 4
  Species Length Width Petal$Length $Width
  <fct>    <dbl> <dbl>        <dbl>  <dbl>
1 setosa     5.1   3.5          1.4    0.2
2 setosa     4.9   3            1.4    0.2
3 setosa     4.7   3.2          1.3    0.2
4 setosa     4.6   3.1          1.5    0.2
5 setosa     5     3.6          1.4    0.2
# ℹ 145 more rows

a %>% unpack(c(Sepal, Petal), names_sep = "_")

Output:

# A tibble: 150 × 5 Species Sepal_Length Sepal_Width Petal_Length Petal_Width <fct> <dbl> <dbl> <dbl> <dbl> 1 setosa 5.1 3.5 1.4 0.2 2 setosa 4.9 3 1.4 0.2 3 setosa 4.7 3.2 1.3 0.2 4 setosa 4.6 3.1 1.5 0.2 5 setosa 5 3.6 1.4 0.2 # ℹ 145 more rows

Use `unpack()` with capture groups

A capture group is a part of a regular expression that is enclosed in parentheses (). str_match() in the stringr package is a nice tool to capture and extract the matched patterns, and returns a matrix. This matrix can be embedded in a tibble as a matrix-column, then converted to a tibble-column, and further unpacked to release the individual columns. The following example demonstrates this procedure. (A detailed instruction about the preparation of dataset x can be found here.)

library(tidyr)library(stringr)
# create a named capture groupnamed.capture <- "(?<status>new)_?(?<type>.*)_(?<gender>.)(?<age>.*)"
x <- who %>%   # select top-20 rows containing most outbreaks in male at age 15-24   slice_max(order_by = new_sp_m1524, n = 20) %>%   # convert to tidy structure  pivot_longer(-c(1:4), names_to = "condition", values_to = "count") %>%     # split the `condition` column into 4 separate columns  # with values extracted from the defined capture group  # return the captured groups as a tibble-column  mutate(    a = str_match(condition, named.capture) %>% as_tibble(),     .keep = "unused")x

Output:
# A tibble: 1,120 × 6
  country iso2  iso3   year count a$V1         $status $type $gender $age
  <chr>   <chr> <chr> <dbl> <dbl> <chr>        <chr>   <chr> <chr>   <chr>
1 India   IN    IND    2010  4871 new_sp_m014  new     sp    m       014
2 India   IN    IND    2010 78278 new_sp_m1524 new     sp    m       1524
3 India   IN    IND    2010 82757 new_sp_m2534 new     sp    m       2534
4 India   IN    IND    2010 90440 new_sp_m3544 new     sp    m       3544
5 India   IN    IND    2010 81210 new_sp_m4554 new     sp    m       4554
# ℹ 1,115 more rows

# unpack the single tibble-column into separate columnsx %>% unpack(a)

Output:
# A tibble: 1,120 × 10
  country iso2  iso3   year count V1           status type  gender age
  <chr>   <chr> <chr> <dbl> <dbl> <chr>        <chr>  <chr> <chr>  <chr>
1 India   IN    IND    2010  4871 new_sp_m014  new    sp    m      014
2 India   IN    IND    2010 78278 new_sp_m1524 new    sp    m      1524
3 India   IN    IND    2010 82757 new_sp_m2534 new    sp    m      2534
4 India   IN    IND    2010 90440 new_sp_m3544 new    sp    m      3544
5 India   IN    IND    2010 81210 new_sp_m4554 new    sp    m      4554
# ℹ 1,115 more rows

Pack multiple columns into a single dataframe-column

Unpack the dataframe-column

Use unpack() with capture groups

Amazing eBook to learn ggplot2 FAST & EASY

Use `unpack()` with capture groups