Opposite to separate(), unite() combines multiple columns into one column by pasting strings together.
library(tidyr)library(dplyr) data <-tibble(first_name =c("John", "Jane", "Alice"),last_name =c("Doe", "Smith", "Johnson"),DOB =c( NA, "3/2/1993", "8/4/1997"),State =c("MA", "NJ", NA),employer =c("MIT", NA, NA)) data
Output:
# A tibble: 3 × 5 first_name last_name DOB State employer <chr> <chr> <chr> <chr> <chr> 1 John Doe <NA> MA MIT 2 Jane Smith 3/2/1993 NJ <NA> 3 Alice Johnson 8/4/1997 <NA> <NA>
The unite() function has four basic arguments, and can be conveniently summarized as unite(data, col, ..., sep = "_"):
data specifies the input data frame, and is the first argument. It can be conveniently passed into this function using the pipe operator %>%.
col specifies the name of the new column to be created.
... specifies the columns to unite. Selection helpers can be used for convenient pick of a range of columns.
sep specifies the separator used between united values. If not specified, the underscore _ is used as the separator by default.
# Create a new column combining the full and last name, connected by a hyphendata %>%unite(col = full_name, first_name, last_name, sep ="-")
Output:
# A tibble: 3 × 4 full_name DOB State employer <chr> <chr> <chr> <chr> 1 John-Doe <NA> MA MIT 2 Jane-Smith 3/2/1993 NJ <NA> 3 Alice-Johnson 8/4/1997 <NA> <NA>
Use remove = FALSE to keep the original columns from the output.
data %>%unite(col = full_name, first_name, last_name, sep ="-",remove = F) # do not remove original columns "first_name" and "last_name"
Output:
# A tibble: 3 × 6 full_name first_name last_name DOB State employer <chr> <chr> <chr> <chr> <chr> <chr> 1 John-Doe John Doe <NA> MA MIT 2 Jane-Smith Jane Smith 3/2/1993 NJ <NA> 3 Alice-Johnson Alice Johnson 8/4/1997 <NA> <NA>
Combine all the columns using everything(). Note that missing values are concatenated as “NA” strings.
data %>%unite(col = people, everything(), remove = F)
Output:
# A tibble: 3 × 6 people first_name last_name DOB State employer <chr> <chr> <chr> <chr> <chr> <chr> 1 John_Doe_NA_MA_MIT John Doe <NA> MA MIT 2 Jane_Smith_3/2/1993_NJ_NA Jane Smith 3/2/1993 NJ <NA> 3 Alice_Johnson_8/4/1997_NA_NA Alice Johnson 8/4/1997 <NA> <NA>
Use na.rm = TRUE to remove missing values prior to uniting the columns.
# A tibble: 3 × 6 people first_name last_name DOB State employer <chr> <chr> <chr> <chr> <chr> <chr> 1 John_Doe_MA_MIT John Doe <NA> MA MIT 2 Jane_Smith_3/2/1993_NJ Jane Smith 3/2/1993 NJ <NA> 3 Alice_Johnson_8/4/1997 Alice Johnson 8/4/1997 <NA> <NA>