Mutate Columns of a Dataset

Use mutate() to create new columns as a function of existing columns, or modify (overwrite) existing columns. The functions inside mutate() operate on selected columns, and independently for each row of the dataset.

Basics of column mutation

library(dplyr)starwars2 <- starwars[, 1:5]

Create new columns.

starwars2 %>%  mutate(h = height/100, # divide height by 100         h.m = height + mass) # sum up height and mass

Output:

# A tibble: 87 × 7
name height mass hair_color skin_color h h.m
<chr> <int> <dbl> <chr> <chr> <dbl> <dbl>
1 Luke Skywalker 172 77 blond fair 1.72 249
2 C-3PO 167 75 <NA> gold 1.67 242
3 R2-D2 96 32 <NA> white, blue 0.96 128
4 Darth Vader 202 136 none white 2.02 338
# ℹ 83 more rows

Modify (overwrite) existing columns.

starwars2 %>%   # capitalize values of the "skin_color" variable  mutate(skin_color = toupper(skin_color))

Output:

# A tibble: 87 × 5
name height mass hair_color skin_color
<chr> <int> <dbl> <chr> <chr>
1 Luke Skywalker 172 77 blond FAIR
2 C-3PO 167 75 <NA> GOLD
3 R2-D2 96 32 <NA> WHITE, BLUE
4 Darth Vader 202 136 none WHITE
# ℹ 83 more rows

Remove a column by setting it to NULL. Alternatively, use select() to remove columns.

starwars2 %>% mutate(hair_color = NULL)

Output:

# A tibble: 87 × 4
name height mass skin_color
<chr> <int> <dbl> <chr>
1 Luke Skywalker 172 77 fair
2 C-3PO 167 75 gold
3 R2-D2 96 32 white, blue
4 Darth Vader 202 136 white
# ℹ 83 more rows

Specify the position of newly created columns

Newly added columns by default are added to the most right side of the data frame. Alternatively, you can use arguments .before and .after to change the position of newly generated columns.

starwars2 %>%   mutate(h2 = height^2, m2 = mass^2,          # add new columns starting from the 4th column         .before = 4)

Output:

# A tibble: 87 × 7
name height mass h2 m2 hair_color skin_color
<chr> <int> <dbl> <dbl> <dbl> <chr> <chr>
1 Luke Skywalker 172 77 29584 5929 blond fair
2 C-3PO 167 75 27889 5625 <NA> gold
3 R2-D2 96 32 9216 1024 <NA> white, blue
4 Darth Vader 202 136 40804 18496 none white
# ℹ 83 more rows
starwars2 %>%   mutate(h2 = height^2, m2 = mass^2,          # add new columns after the "mass" column         .after = mass)

Output:

# A tibble: 87 × 7
name height mass h2 m2 hair_color skin_color
<chr> <int> <dbl> <dbl> <dbl> <chr> <chr>
1 Luke Skywalker 172 77 29584 5929 blond fair
2 C-3PO 167 75 27889 5625 <NA> gold
3 R2-D2 96 32 9216 1024 <NA> white, blue
4 Darth Vader 202 136 40804 18496 none white
# ℹ 83 more rows

Retain selected columns in the output dataset

By default, all original and newly generated columns are kept in the output. Alternatively, use .keep argument (defaults to "all") to selectively reserve desired columns in the output. Newly generated variables and the grouping variables are always kept in the output.

  • "used": retain only columns used to create new columns.
starwars2 %>% mutate(  h2 = height^2,   # only "height" and "h2" in the output   .keep = "used")

Output:

# A tibble: 87 × 2
height h2
<int> <dbl>
1 172 29584
2 167 27889
3 96 9216
4 202 40804
# ℹ 83 more rows
  • "unused": retain only columns not used in creating new columns.
starwars2 %>% mutate(  h2 = height^2,   # "height" is removed from the output  .keep = "unused")

Output:

# A tibble: 87 × 5
name mass hair_color skin_color h2
<chr> <dbl> <chr> <chr> <dbl>
1 Luke Skywalker 77 blond fair 29584
2 C-3PO 75 <NA> gold 27889
3 R2-D2 32 <NA> white, blue 9216
4 Darth Vader 136 none white 40804
# ℹ 83 more rows
  • "none": remove all columns (except the newly generated columns and the grouping variables).
starwars %>%   # calculate the relative height of each being WITHIN its own species  group_by(species) %>%   mutate(    h = height / max(height, na.rm = T),     # only "species" and "h" are kept in the output    .keep = "none")

Output:

# A tibble: 87 × 2
# Groups: species [38]
species h
<chr> <dbl>
1 Human 0.851
2 Droid 0.835
3 Droid 0.48
4 Human 1
# ℹ 83 more rows