library(stringr)
<- c("apple", "banana", "pear", "kiwi") fruit # Return fruit names containing letter "e"str_subset(fruit, pattern = "e")
Output:
[1] "apple" "pear"
str_subset()
is very similar to str_detect()
, but instead of returning a logical vector of TRUE and FALSE, it returns the string elements that contain the matched pattern.
library(stringr)
<- c("apple", "banana", "pear", "kiwi") fruit # Return fruit names containing letter "e"str_subset(fruit, pattern = "e")
Output:
[1] "apple" "pear"
Regular expression is often used to define a pattern. For instance, the caret sign ^
indicates a pattern at the start of a string, and $
indicates a pattern at the end of a string.
# return elements that are started with letter "a"str_subset(fruit, pattern = "^a")
Output:
[1] "apple"
# return elements that are ended with letter "a"str_subset(fruit, pattern = "a$")
Output:
[1] "banana"
We can select elements that don’t match the specified pattern with negate = TRUE
(which defaults to FALSE).
# return elements not ended with letter "a". str_subset(fruit, pattern = "a$", negate = T)
Output:
[1] "apple" "pear" "kiwi"
The missing value NA
is not a match to any pattern.
str_subset(c(NA, "apple", "bee"), "e")
Output:
[1] "apple" "bee"
🎁 Bonus knowledge !
str_subset()
is basically a wrapper (a simplified function) around x[str_detect(x, pattern)]
. However, it handles the NA
values slightly different. See example below.
<- c(NA, "abc", "xyz") X
# NA is included in the outputstr_detect(X, "a") ] X [
Output:
[1] NA "abc"
# NA is removed from the outputstr_subset(X, "a")
Output:
[1] "abc"
str_subset()
is equivalent to the base R function grep(pattern, x, value = TRUE)
. A major difference between these two functions is the default order of arguments: the string vector is the first argument in all functions of the stringr
package, but is the second argument in grep()
.
grep(pattern = "a$", x = fruit, value = TRUE)
Output:
[1] "banana"