install.packages("tidyverse")
Welcome to tidyr Tutorial!
Tidy data describes a standard way of storing data that is used wherever possible throughout the tidyverse. The tidy structure facilitates easier manipulation, visualization, and modeling, enabling efficient and reproducible data analysis pipelines.
A dataset is considered tidy if it meets the following three criteria:
- Each variable is a column; each column is a variable.
- Each observation is a row; each row is an observation.
- Each value is a cell; each cell is a single value.
The goal of tidyr is to help you create tidy data frames. The easiest way to get tidyr is to install the whole tidyverse:
Alternatively, install just the tidyr package:
install.packages("tidyr")
Content of the tidyr tutorial
This tutorial will systematically cover the following topics with detailed instructions and enriched examples:
Pivot dataset into longer or wider format
pivot_longer()
reshapes dataset into a “longer” format. It is one of the most important tidyr functions, a silver bullet that turns messy data into tidy structure. Given the paramount importance and high versatility of this function, and the great complexity of dataset you can encounter in real-world analysis, we’ll use 4 short tutorials to explain this function from the basics to advanced techniques.
pivot_wider()
reshapes the dataset into a “wider” format for summary presentation and analysis by many other tools. We’ll spend 3 tutorials to develop an in-depth discussion.
Columns manipulation:
unite()
: Unite multiple columns into one columnseparate()
: Split a single column into multiple ones
Handle missing values NA
:
drop_na()
: Remove Rows Containing missing valuesfill()
: Fill up missing values with previous or following valuesreplace_na()
: Replace missing values with specified values
Create all possible combinations of selected variables. We’ll spend three sections to discuss this topic:
expand()
: Create all possible combinations between levels of the selected columns of an input dataset. We’ll also discussnesting()
used insideexpand()
to generate combinations already present in the dataset; and combineexpand()
with*_join()
functions in the dplyr package to create useful applications.expand_grid()
: Create combinations from vectors.