Extract Matched Patterns from a String

str_extract() extracts the first match from each string element.
str_extract_all() extracts all matches from each string element.

str_extract() extracts characters that follow a specified pattern from each string element.

In the following example, the regular expression [a-z]{1,6} (character class) matches any sequence of 1 to 6 consecutive lowercase letters.

library(stringr)
shop_list <- c(  "apples *40 Walmart", "flour *12 Target", "sugar *3 Costco")
str_extract(shop_list, pattern = "[a-z]{1,6}")

Output:
[1] "apples" "flour"  "sugar"

Note that for str_extract(), only the first match is extracted; although “Walmart”, “Target”, and “Costco” are also matched patterns, they are not extracted and retained in the output.

In addition to regular expression, the package rebus offers a more intuitive and easily memorable syntax to define a pattern, e.g., one_or_more(WRD) matches any pattern that contains one or multiple consecutive words (letters or digits, or said WRD).

# install.packages("rebus")library(rebus) str_extract(shop_list, pattern = one_or_more(WRD))

Output:
[1] "apples" "flour"  "sugar"

Extract consecutive digits (DGT).

str_extract(shop_list, pattern = one_or_more(DGT))

Output:
[1] "40" "12" "3"

str_extract_all() extracts all matches from each string element, and returns a list.

# shop_list <- c(#   "apples x4", "bag of flour x1", #   "bag of sugar x3", "milk x4")
str_extract_all(shop_list, pattern = one_or_more(WRD))

Output:
[[1]]
[1] "apples"  "40"      "Walmart"
[[2]]
[1] "flour"  "12"     "Target"
[[3]]
[1] "sugar"  "3"      "Costco"

Use simplify = T to return a character matrix.

str_extract_all(shop_list, pattern = one_or_more(WRD),                simplify = T)

Output:
     [,1]     [,2] [,3]
[1,] "apples" "40" "Walmart"
[2,] "flour"  "12" "Target"
[3,] "sugar"  "3"  "Costco"

Amazing eBook to learn ggplot2 FAST & EASY