Unite Multiple Columns into a Single Column

Opposite to separate(), unite() combines multiple columns into one column by pasting strings together.

library(tidyr)library(dplyr)
data <- tibble( first_name = c("John", "Jane", "Alice"), last_name = c("Doe", "Smith", "Johnson"), DOB = c( NA, "3/2/1993", "8/4/1997"), State = c("MA", "NJ", NA), employer = c("MIT", NA, NA))
data

Output:

# A tibble: 3 × 5
first_name last_name DOB State employer
<chr> <chr> <chr> <chr> <chr>
1 John Doe <NA> MA MIT
2 Jane Smith 3/2/1993 NJ <NA>
3 Alice Johnson 8/4/1997 <NA> <NA>

The unite() function has four basic arguments, and can be conveniently summarized as unite(data, col, ..., sep = "_"):

  • data specifies the input data frame, and is the first argument. It can be conveniently passed into this function using the pipe operator %>%.
  • col specifies the name of the new column to be created.
  • ... specifies the columns to unite. Selection helpers can be used for convenient pick of a range of columns.
  • sep specifies the separator used between united values. If not specified, the underscore _ is used as the separator by default.
# Create a new column combining the full and last name, connected by a hyphendata %>% unite(col = full_name, first_name, last_name, sep = "-")

Output:

# A tibble: 3 × 4
full_name DOB State employer
<chr> <chr> <chr> <chr>
1 John-Doe <NA> MA MIT
2 Jane-Smith 3/2/1993 NJ <NA>
3 Alice-Johnson 8/4/1997 <NA> <NA>

Use remove = FALSE to keep the original columns from the output.

data %>%   unite(    col = full_name, first_name, last_name, sep = "-",    remove = F) # do not remove original columns "first_name" and "last_name"

Output:

# A tibble: 3 × 6
full_name first_name last_name DOB State employer
<chr> <chr> <chr> <chr> <chr> <chr>
1 John-Doe John Doe <NA> MA MIT
2 Jane-Smith Jane Smith 3/2/1993 NJ <NA>
3 Alice-Johnson Alice Johnson 8/4/1997 <NA> <NA>

Combine all the columns using everything(). Note that missing values are concatenated as “NA” strings.

data %>% unite(col = people, everything(), remove = F)

Output:

# A tibble: 3 × 6
people first_name last_name DOB State employer
<chr> <chr> <chr> <chr> <chr> <chr>
1 John_Doe_NA_MA_MIT John Doe <NA> MA MIT
2 Jane_Smith_3/2/1993_NJ_NA Jane Smith 3/2/1993 NJ <NA>
3 Alice_Johnson_8/4/1997_NA_NA Alice Johnson 8/4/1997 <NA> <NA>

Use na.rm = TRUE to remove missing values prior to uniting the columns.

data %>% unite(col = people, everything(), remove = F, na.rm = T)

Output:

# A tibble: 3 × 6
people first_name last_name DOB State employer
<chr> <chr> <chr> <chr> <chr> <chr>
1 John_Doe_MA_MIT John Doe <NA> MA MIT
2 Jane_Smith_3/2/1993_NJ Jane Smith 3/2/1993 NJ <NA>
3 Alice_Johnson_8/4/1997 Alice Johnson 8/4/1997 <NA> <NA>