Welcome to tidyr Tutorial!

Tidy data describes a standard way of storing data that is used wherever possible throughout the tidyverse. The tidy structure facilitates easier manipulation, visualization, and modeling, enabling efficient and reproducible data analysis pipelines.

A dataset is considered tidy if it meets the following three criteria:

  • Each variable is a column; each column is a variable.
  • Each observation is a row; each row is an observation.
  • Each value is a cell; each cell is a single value.

The goal of tidyr is to help you create tidy data frames. The easiest way to get tidyr is to install the whole tidyverse:

install.packages("tidyverse")

Alternatively, install just the tidyr package:

install.packages("tidyr")

Content of the tidyr tutorial

This tutorial will systematically cover the following topics with detailed instructions and enriched examples:

Pivot dataset into longer or wider format

pivot_longer() reshapes dataset into a “longer” format. It is one of the most important tidyr functions, a silver bullet that turns messy data into tidy structure. Given the paramount importance and high versatility of this function, and the great complexity of dataset you can encounter in real-world analysis, we’ll use 4 short tutorials to explain this function from the basics to advanced techniques.

pivot_wider() reshapes the dataset into a “wider” format for summary presentation and analysis by many other tools. We’ll spend 3 tutorials to develop an in-depth discussion.

Columns manipulation:

  • unite(): Unite multiple columns into one column
  • separate(): Split a single column into multiple ones

Handle missing values NA:

  • drop_na(): Remove Rows Containing missing values
  • fill(): Fill up missing values with previous or following values
  • replace_na(): Replace missing values with specified values

Create all possible combinations of selected variables. We’ll spend three sections to discuss this topic:

  • expand(): Create all possible combinations between levels of the selected columns of an input dataset. We’ll also discuss nesting() used inside expand() to generate combinations already present in the dataset; and combine expand() with *_join() functions in the dplyr package to create useful applications.
  • expand_grid(): Create combinations from vectors.