Mutate Columns of a Dataset

Use mutate() to create new columns as a function of existing columns, or modify (overwrite) existing columns. The functions inside mutate() operate on selected columns, and independently for each row of the dataset.

Basics of column mutation

library(dplyr)starwars2 <- starwars[, 1:5]

Create new columns.

starwars2 %>%  mutate(h = height/100, # divide height by 100         h.m = height + mass) # sum up height and mass

Output:
  # A tibble: 87 × 7
    name           height  mass hair_color skin_color      h   h.m
    <chr>           <int> <dbl> <chr>      <chr>       <dbl> <dbl>
  1 Luke Skywalker    172    77 blond      fair         1.72   249
  2 C-3PO             167    75 <NA>       gold         1.67   242
  3 R2-D2              96    32 <NA>       white, blue  0.96   128
  4 Darth Vader       202   136 none       white        2.02   338
  # ℹ 83 more rows

Modify (overwrite) existing columns.

starwars2 %>%   # capitalize values of the "skin_color" variable  mutate(skin_color = toupper(skin_color))

Output:
  # A tibble: 87 × 5
    name           height  mass hair_color skin_color
    <chr>           <int> <dbl> <chr>      <chr>
  1 Luke Skywalker    172    77 blond      FAIR
  2 C-3PO             167    75 <NA>       GOLD
  3 R2-D2              96    32 <NA>       WHITE, BLUE
  4 Darth Vader       202   136 none       WHITE
  # ℹ 83 more rows

Remove a column by setting it to NULL. Alternatively, use select() to remove columns.

starwars2 %>% mutate(hair_color = NULL)

Output:
  # A tibble: 87 × 4
    name           height  mass skin_color
    <chr>           <int> <dbl> <chr>
  1 Luke Skywalker    172    77 fair
  2 C-3PO             167    75 gold
  3 R2-D2              96    32 white, blue
  4 Darth Vader       202   136 white
  # ℹ 83 more rows

Specify the position of newly created columns

Newly added columns by default are added to the most right side of the data frame. Alternatively, you can use arguments .before and .after to change the position of newly generated columns.

starwars2 %>%   mutate(h2 = height^2, m2 = mass^2,          # add new columns starting from the 4th column         .before = 4)

Output:
  # A tibble: 87 × 7
    name           height  mass    h2    m2 hair_color skin_color
    <chr>           <int> <dbl> <dbl> <dbl> <chr>      <chr>
  1 Luke Skywalker    172    77 29584  5929 blond      fair
  2 C-3PO             167    75 27889  5625 <NA>       gold
  3 R2-D2              96    32  9216  1024 <NA>       white, blue
  4 Darth Vader       202   136 40804 18496 none       white
  # ℹ 83 more rows

starwars2 %>%   mutate(h2 = height^2, m2 = mass^2,          # add new columns after the "mass" column         .after = mass)

Output:
  # A tibble: 87 × 7
    name           height  mass    h2    m2 hair_color skin_color
    <chr>           <int> <dbl> <dbl> <dbl> <chr>      <chr>
  1 Luke Skywalker    172    77 29584  5929 blond      fair
  2 C-3PO             167    75 27889  5625 <NA>       gold
  3 R2-D2              96    32  9216  1024 <NA>       white, blue
  4 Darth Vader       202   136 40804 18496 none       white
  # ℹ 83 more rows

Retain selected columns in the output dataset

By default, all original and newly generated columns are kept in the output. Alternatively, use .keep argument (defaults to "all") to selectively reserve desired columns in the output. Newly generated variables and the grouping variables are always kept in the output.

"used": retain only columns used to create new columns.

starwars2 %>% mutate(  h2 = height^2,   # only "height" and "h2" in the output   .keep = "used")

Output:
  # A tibble: 87 × 2
    height    h2
     <int> <dbl>
  1    172 29584
  2    167 27889
  3     96  9216
  4    202 40804
  # ℹ 83 more rows

"unused": retain only columns not used in creating new columns.

starwars2 %>% mutate(  h2 = height^2,   # "height" is removed from the output  .keep = "unused")

Output:
  # A tibble: 87 × 5
    name            mass hair_color skin_color     h2
    <chr>          <dbl> <chr>      <chr>       <dbl>
  1 Luke Skywalker    77 blond      fair        29584
  2 C-3PO             75 <NA>       gold        27889
  3 R2-D2             32 <NA>       white, blue  9216
  4 Darth Vader      136 none       white       40804
  # ℹ 83 more rows

"none": remove all columns (except the newly generated columns and the grouping variables).

starwars %>%   # calculate the relative height of each being WITHIN its own species  group_by(species) %>%   mutate(    h = height / max(height, na.rm = T),     # only "species" and "h" are kept in the output    .keep = "none")

Output:
  # A tibble: 87 × 2
  # Groups:   species [38]
    species     h
    <chr>   <dbl>
  1 Human   0.851
  2 Droid   0.835
  3 Droid   0.48
  4 Human   1
  # ℹ 83 more rows

Basics of column mutation

Specify the position of newly created columns

Retain selected columns in the output dataset

Amazing eBook to learn ggplot2 FAST & EASY