Select Strings Containing a Matched Pattern

str_subset() is very similar to str_detect(), but instead of returning a logical vector of TRUE and FALSE, it returns the string elements that contain the matched pattern.

library(stringr)
fruit <- c("apple", "banana", "pear", "kiwi")# Return fruit names containing letter "e"str_subset(fruit, pattern = "e")

Output:

[1] "apple" "pear"

Regular expression is often used to define a pattern. For instance, the caret sign ^ indicates a pattern at the start of a string, and $ indicates a pattern at the end of a string.

# return elements that are started with letter "a"str_subset(fruit, pattern = "^a") 

Output:

[1] "apple"
# return elements that are ended with letter "a"str_subset(fruit, pattern = "a$") 

Output:

[1] "banana"

We can select elements that don’t match the specified pattern with negate = TRUE (which defaults to FALSE).

# return elements not ended with letter "a". str_subset(fruit, pattern = "a$", negate = T)

Output:

[1] "apple" "pear" "kiwi"

The missing value NA is not a match to any pattern.

str_subset(c(NA, "apple", "bee"), "e")

Output:

[1] "apple" "bee"

🎁 Bonus knowledge !

str_subset() is basically a wrapper (a simplified function) around x[str_detect(x, pattern)]. However, it handles the NA values slightly different. See example below.

X <- c(NA, "abc", "xyz")
# NA is included in the outputX [ str_detect(X, "a") ]

Output:

[1] NA "abc"
# NA is removed from the outputstr_subset(X, "a")

Output:

[1] "abc"

str_subset() is equivalent to the base R function grep(pattern, x, value = TRUE). A major difference between these two functions is the default order of arguments: the string vector is the first argument in all functions of the stringr package, but is the second argument in grep().

grep(pattern = "a$", x = fruit, value = TRUE)

Output:

[1] "banana"