Split Strings into Multiple Pieces

str_split() splits each string element into multiple pieces based on the specified separator, and returns a list.

library(stringr)
fruits <- c("mango, peach, kiwi, and lemon", "apple, banana, and orange")
# split strings at the site of commasstr_split(fruits, pattern = ",")

Output:

[[1]]
[1] "mango" " peach" " kiwi" " and lemon"
[[2]]
[1] "apple" " banana" " and orange"

The argument n specifies the maximum number of pieces each string element should be split into. In the example below, we limit the maximum number of pieces each string element is split into to be 3.

str_split(fruits, pattern = ",", n = 3) 

Output:

[[1]]
[1] "mango" " peach" " kiwi, and lemon"
[[2]]
[1] "apple" " banana" " and orange"

Use simplify = T to return the split strings as a matrix.

str_split(fruits, pattern = ",", n = 3, simplify = T)

Output:

[,1] [,2] [,3]
[1,] "mango" " peach" " kiwi, and lemon"
[2,] "apple" " banana" " and orange"

When returned as a matrix, if the specified n value (e.g., n = 5) exceeds the maximal number of possible splits, empty columns will be created to make a total of n columns.

str_split(fruits, pattern = ",", n = 5, simplify = T)

Output:

[,1] [,2] [,3] [,4] [,5]
[1,] "mango" " peach" " kiwi" " and lemon" ""
[2,] "apple" " banana" " and orange" "" ""

🏆 One level up !

str_split() has three variant functions, which differ primarily in their input and output types:

  • str_split1() takes a vector of length 1 as the string input, and returns the split strings as a vector.
str_split_1(fruits[1], pattern = ",")

Output:

[1] "mango" " peach" " kiwi" " and lemon"
  • str_split_fixed() returns a character matrix with fixed n columns. In many ways it is equivalent to str_split(..., simplify = T).
x <- c("a piece of sizzling bacon",       "two bread loafs")
str_split_fixed(x, pattern = " ", n = 5)

Output:

[,1] [,2] [,3] [,4] [,5]
[1,] "a" "piece" "of" "sizzling" "bacon"
[2,] "two" "bread" "loafs" "" ""
  • str_split_i() extracts the ith piece from each of the input element, and returns a vector.
# extract the 2nd word from each string elementstr_split_i(x, pattern = " ", i = 2)

Output:

[1] "piece" "bread"
# extract the last word from each string elementstr_split_i(x, pattern = " ", i = -1)

Output:

[1] "bacon" "loafs"