Spread Columns into Wider Dataset (1/3): the basics ofpivot_wider()
pivot_wider() (previously known as spread()) is the opposite of pivot_longer(): it makes a dataset wider by increasing the number of columns and decreasing the number of rows. It’s relatively rare to need pivot_wider() to make tidy data, but it’s often useful for creating summary tables for presentation, or data in a format needed by many other tools.
Below you’ll learn the basics of pivot_wider() and appreciate its application from the following three excellent examples.
e.g.1population is a subset of data from the World Health Organization that records the annual population in countries from 1995 to 2013. The dataset is in a nice tidy structure.
You can use pivot_wider() to reshape the dataset into a wider format to make it easier for visual check and additional analysis with other tools. The first argument is the dataset to tidy up. Besides, there are another two basic arguments:
names_from specifies the column whose unique values will become the new column names in the pivoted wide-format data frame.
values_from specifies the column whose values will populate the cells of the resulted wide-format data frame.
population %>%pivot_wider(# unique values in the 'year' column are spread out as new column namesnames_from = year, # values in the 'population' column are used to fill up cells of new columnsvalues_from = population)
e.g.2 When a tagged fish swims downstream in a river, each instance of its detection by an autonomous monitor (referred to as a station) is encoded as 1. The fish_encounters dataset records such an encounters history.
fish_encounters
Output:
# A tibble: 114 × 3 fish station seen <fct> <fct> <int> 1 4842 Release 1 2 4842 I80_1 1 3 4842 Lisbon 1 4 4842 Rstr 1 5 4842 Base_TD 1 6 4842 BCE 1 7 4842 BCW 1 8 4842 BCE2 1 9 4842 BCW2 1 10 4842 MAE 1 # ℹ 104 more rows
Below we’ll pivot the dataset into a wider format. This quickly generates a bird view of detection instances of the tagged fish by different stations.
fish_encounters %>%pivot_wider(# unique values in the 'station' column are spread out as new column namesnames_from = station, # values in the 'seen' column are used to fill up cells of new columnsvalues_from = seen)
Output:
# A tibble: 19 × 12 fish Release I80_1 Lisbon Rstr Base_TD BCE BCW BCE2 BCW2 MAE MAW <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> 1 4842 1 1 1 1 1 1 1 1 1 1 1 2 4843 1 1 1 1 1 1 1 1 1 1 1 3 4844 1 1 1 1 1 1 1 1 1 1 1 4 4845 1 1 1 1 1 NA NA NA NA NA NA 5 4847 1 1 1 NA NA NA NA NA NA NA NA 6 4848 1 1 1 1 NA NA NA NA NA NA NA 7 4849 1 1 NA NA NA NA NA NA NA NA NA 8 4850 1 1 NA 1 1 1 1 NA NA NA NA 9 4851 1 1 NA NA NA NA NA NA NA NA NA 10 4854 1 1 NA NA NA NA NA NA NA NA NA 11 4855 1 1 1 1 1 NA NA NA NA NA NA 12 4857 1 1 1 1 1 1 1 1 1 NA NA 13 4858 1 1 1 1 1 1 1 1 1 1 1 14 4859 1 1 1 1 1 NA NA NA NA NA NA 15 4861 1 1 1 1 1 1 1 1 1 1 1 16 4862 1 1 1 1 1 1 1 1 1 NA NA 17 4863 1 1 NA NA NA NA NA NA NA NA NA 18 4864 1 1 NA NA NA NA NA NA NA NA NA 19 4865 1 1 1 NA NA NA NA NA NA NA NA
A registry of 1 is recorded by a station only when a fish was detected; nothing was recorded when the fish was not detected, resulting in NA values. As such, we can ask pivot_wider() to fill in these missing values with zeros using the values_fill argument.
fish_encounters %>%pivot_wider(names_from = station, values_from = seen,values_fill =0# use 0 in place of NA values)
# A tibble: 6 × 2 field value <chr> <chr> 1 name Jiena McLellan 2 company Toyota 3 name John Smith 4 company google 5 email john@google.com 6 name Huxley Ratcliffe
This is challenging because there’s no variable that identifies which observations belong together. We can fix this by noting that every contact starts with a name, so we can create a unique id by counting every time we see name as the field.
# A tibble: 6 × 3 field value person_id <chr> <chr> <int> 1 name Jiena McLellan 1 2 company Toyota 1 3 name John Smith 2 4 company google 2 5 email john@google.com 2 6 name Huxley Ratcliffe 3
Now that we have a unique identifier for each person, we can pivot field and value into the columns.
# A tibble: 3 × 4 person_id name company email <int> <chr> <chr> <chr> 1 1 Jiena McLellan Toyota <NA> 2 2 John Smith google john@google.com 3 3 Huxley Ratcliffe <NA> <NA>
Now you have been familiar with the basic use of pivot_wider(). The following two tutorials on pivot_wider() will discuss more advanced features that allow you to efficiently pivot datasets with increasingly complex structure.