library(stringr)
<- c( shop_list "apples *40 Walmart", "flour *12 Target", "sugar *3 Costco")
str_extract(shop_list, pattern = "[a-z]{1,6}")
Output:
[1] "apples" "flour" "sugar"
str_extract()
extracts the first match from each string element.str_extract_all()
extracts all matches from each string element.str_extract()
extracts characters that follow a specified pattern from each string element.
In the following example, the regular expression [a-z]{1,6}
(character class) matches any sequence of 1 to 6 consecutive lowercase letters.
library(stringr)
<- c( shop_list "apples *40 Walmart", "flour *12 Target", "sugar *3 Costco")
str_extract(shop_list, pattern = "[a-z]{1,6}")
Output:
[1] "apples" "flour" "sugar"
Note that for str_extract()
, only the first match is extracted; although “Walmart”, “Target”, and “Costco” are also matched patterns, they are not extracted and retained in the output.
In addition to regular expression, the package rebus
offers a more intuitive and easily memorable syntax to define a pattern, e.g., one_or_more(WRD)
matches any pattern that contains one or multiple consecutive words (letters or digits, or said WRD
).
# install.packages("rebus")library(rebus) str_extract(shop_list, pattern = one_or_more(WRD))
Output:
[1] "apples" "flour" "sugar"
Extract consecutive digits (DGT
).
str_extract(shop_list, pattern = one_or_more(DGT))
Output:
[1] "40" "12" "3"
str_extract_all()
extracts all matches from each string element, and returns a list.
# shop_list <- c(# "apples x4", "bag of flour x1", # "bag of sugar x3", "milk x4")
str_extract_all(shop_list, pattern = one_or_more(WRD))
Output:
[[1]]
[1] "apples" "40" "Walmart"
[[2]]
[1] "flour" "12" "Target"
[[3]]
[1] "sugar" "3" "Costco"
Use simplify = T
to return a character matrix.
str_extract_all(shop_list, pattern = one_or_more(WRD),simplify = T)
Output:
[,1] [,2] [,3]
[1,] "apples" "40" "Walmart"
[2,] "flour" "12" "Target"
[3,] "sugar" "3" "Costco"