Welcome to the dplyr Package!

Data wrangling is one of the most important steps in data science. In R, a wide range of such tasks can be readily accomplished using the core package dplyr. From this tutorial, you’ll truly master the power of this package.

First, you’ll learn the important pipe operator %>%, which allows you to streamline functions in a highly efficient manner. Majority of functions in this tutorial will be written in a piped style.

Then, you’ll study the six most commonly used functions. A diverse combination of these six functions allows you to perform a good majority of data wrangling tasks.

  • Select columns with select(). You’ll also learn many related techniques, including selection helper functions and the purrr style. These techniques are highly helpful for select() and many other dplyr functions.
  • Filter rows with filter().
  • Modify or create new columns with mutate().
  • Create summarizing statistics with summarize().
  • Divide dataset into groups with group_by().
  • Arrange rows with arrange().

Next, you’ll master more functions divided into the following sections. These functions further empower you to address complicated tasks with great efficiency and flexibility.

  • Functions that operate on rows. You’ll learn distinct() to select unique non-duplicated rows, and the slice_*() family functions to select rows (complementary to filter()).

  • Functions that operate on columns. You’ll learn glimpse() to quickly glance at the dataset structure and content, pull() to extract values from a single column, rename() and rename_with() to rename column header names, and relocate() to change column order.

  • Column-wise and row-wise repeated operations. You’ll use across() to perform repeated operations across multiple columns, and rowwise() and c_across() to apply repeated operations across multiple rows.

  • Functions for paired datasets This section covers a wide variety of techniques to merge (or subtract) two datasets, including mutating join (inner_join(), left_join(), right_join(), and full_join()), filtering join (semi_join() and anti_join()), nest_join(), and cross_join(). In addition, you’ll learn how to bind two or multiple datasets by columns or rows with bind_cols() and bind_rows(), and perform a variety of row-based set operations.

  • In the end, you’ll learn the advanced feature of data masking to effectively incorporate these dplyr tools into your self-defined functions.