library(dplyr)<- starwars[, 1:5] starwars2
Mutate Columns of a Dataset
Use mutate()
to create new columns as a function of existing columns, or modify (overwrite) existing columns. The functions inside mutate()
operate on selected columns, and independently for each row of the dataset.
Basics of column mutation
Create new columns.
%>% starwars2 mutate(h = height/100, # divide height by 100 h.m = height + mass) # sum up height and mass
Output:
# A tibble: 87 × 7
name height mass hair_color skin_color h h.m
<chr> <int> <dbl> <chr> <chr> <dbl> <dbl>
1 Luke Skywalker 172 77 blond fair 1.72 249
2 C-3PO 167 75 <NA> gold 1.67 242
3 R2-D2 96 32 <NA> white, blue 0.96 128
4 Darth Vader 202 136 none white 2.02 338
# ℹ 83 more rows
Modify (overwrite) existing columns.
%>% starwars2 # capitalize values of the "skin_color" variable mutate(skin_color = toupper(skin_color))
Output:
# A tibble: 87 × 5
name height mass hair_color skin_color
<chr> <int> <dbl> <chr> <chr>
1 Luke Skywalker 172 77 blond FAIR
2 C-3PO 167 75 <NA> GOLD
3 R2-D2 96 32 <NA> WHITE, BLUE
4 Darth Vader 202 136 none WHITE
# ℹ 83 more rows
Remove a column by setting it to NULL
. Alternatively, use select()
to remove columns.
%>% mutate(hair_color = NULL) starwars2
Output:
# A tibble: 87 × 4
name height mass skin_color
<chr> <int> <dbl> <chr>
1 Luke Skywalker 172 77 fair
2 C-3PO 167 75 gold
3 R2-D2 96 32 white, blue
4 Darth Vader 202 136 white
# ℹ 83 more rows
Specify the position of newly created columns
Newly added columns by default are added to the most right side of the data frame. Alternatively, you can use arguments .before
and .after
to change the position of newly generated columns.
%>% starwars2 mutate(h2 = height^2, m2 = mass^2, # add new columns starting from the 4th column .before = 4)
Output:
# A tibble: 87 × 7
name height mass h2 m2 hair_color skin_color
<chr> <int> <dbl> <dbl> <dbl> <chr> <chr>
1 Luke Skywalker 172 77 29584 5929 blond fair
2 C-3PO 167 75 27889 5625 <NA> gold
3 R2-D2 96 32 9216 1024 <NA> white, blue
4 Darth Vader 202 136 40804 18496 none white
# ℹ 83 more rows
%>% starwars2 mutate(h2 = height^2, m2 = mass^2, # add new columns after the "mass" column .after = mass)
Output:
# A tibble: 87 × 7
name height mass h2 m2 hair_color skin_color
<chr> <int> <dbl> <dbl> <dbl> <chr> <chr>
1 Luke Skywalker 172 77 29584 5929 blond fair
2 C-3PO 167 75 27889 5625 <NA> gold
3 R2-D2 96 32 9216 1024 <NA> white, blue
4 Darth Vader 202 136 40804 18496 none white
# ℹ 83 more rows
Retain selected columns in the output dataset
By default, all original and newly generated columns are kept in the output. Alternatively, use .keep
argument (defaults to "all"
) to selectively reserve desired columns in the output. Newly generated variables and the grouping variables are always kept in the output.
"used"
: retain only columns used to create new columns.
%>% mutate( starwars2 h2 = height^2, # only "height" and "h2" in the output .keep = "used")
Output:
# A tibble: 87 × 2
height h2
<int> <dbl>
1 172 29584
2 167 27889
3 96 9216
4 202 40804
# ℹ 83 more rows
"unused"
: retain only columns not used in creating new columns.
%>% mutate( starwars2 h2 = height^2, # "height" is removed from the output .keep = "unused")
Output:
# A tibble: 87 × 5
name mass hair_color skin_color h2
<chr> <dbl> <chr> <chr> <dbl>
1 Luke Skywalker 77 blond fair 29584
2 C-3PO 75 <NA> gold 27889
3 R2-D2 32 <NA> white, blue 9216
4 Darth Vader 136 none white 40804
# ℹ 83 more rows
"none"
: remove all columns (except the newly generated columns and the grouping variables).
%>% starwars # calculate the relative height of each being WITHIN its own species group_by(species) %>% mutate( h = height / max(height, na.rm = T), # only "species" and "h" are kept in the output .keep = "none")
Output:
# A tibble: 87 × 2
# Groups: species [38]
species h
<chr> <dbl>
1 Human 0.851
2 Droid 0.835
3 Droid 0.48
4 Human 1
# ℹ 83 more rows