library(stringr)<- c("Life's colorful", "vivid colours") a
Specify the Number of Matched Patterns
Quantifiers specify the number of occurrences of the immediately preceding character.
?
(zero or one occurrence) , e.g., colou?r
matches “color” and “colour”.
str_view_all(a, "colou?r")
Output:
[1] │ Life's <color>ful
[2] │ vivid <colour>s
str_extract_all(a, "colou?r", simplify = T)
Output:
[,1]
[1,] "color"
[2,] "colour"
+
(one or more) , e.g., a+
matches one or more consecutive letter “a”.
library(stringr)<- c("bb", "ba++", "baaa-naaaa-nAAA") x
str_view_all(x, "a+")
Output:
[1] │ bb
[2] │ b<a>++
[3] │ b<aaa>-n<aaaa>-nAAA
str_extract_all(x, "a+", simplify = T)
Output:
[,1] [,2]
[1,] "" ""
[2,] "a" ""
[3,] "aaa" "aaaa"
*
(zero or more) , e.g., ba*
matches “b” followed by zero or more “a” characters.
str_view_all(x, "ba*")
Output:
[1] │ <b><b>
[2] │ <ba>++
[3] │ <baaa>-naaaa-nAAA
str_extract_all(x, "ba*", simplify = T)
Output:
[,1] [,2]
[1,] "b" "b"
[2,] "ba" ""
[3,] "baaa" ""
{n}
(exactly n times) , e.g., ba{3}
matches “baaa”.
str_view_all(x, "ba{3}")
Output:
[1] │ bb
[2] │ ba++
[3] │ <baaa>-naaaa-nAAA
str_extract_all(x, "ba{3}", simplify = T)
Output:
[,1]
[1,] ""
[2,] ""
[3,] "baaa"
{n,}
(at Least n times) , e.g., ba{2,}
match “baa”, “baaa”, “baaaa”, and so on. Note that there should be no white space inside the curly braces.
str_view_all(x, "ba{2,}")
Output:
[1] │ bb
[2] │ ba++
[3] │ <baaa>-naaaa-nAAA
str_extract_all(x, "ba{2,}", simplify = T)
Output:
[,1]
[1,] ""
[2,] ""
[3,] "baaa"
{n,m}
(between n and m times) , e.g., ba{1,2}
matches “ba” or “baa”. Again, no white space should be present in the curly braces.
str_view_all(x, "ba{1,2}")
Output:
[1] │ bb
[2] │ <ba>++
[3] │ <baa>a-naaaa-nAAA
str_extract_all(x, "ba{1,2}", simplify = T)
Output:
[,1]
[1,] ""
[2,] "ba"
[3,] "baa"
Combination of Character Class and Quantifier
[0-9]{3,4}
or [:digit:]{3,4}
matches 3 to 4 consecutive numeric characters. The following examples extract the area and subscriber code from telephone numbers.
<- c("Alice: 137-807-6865", "Mike: 732-987-1986") s <- "[:digit:]{3,4}" p
str_view_all(s, p)
Output:
[1] │ Alice: <137>-<807>-<6865>
[2] │ Mike: <732>-<987>-<1986>
str_extract_all(s, p, simplify = T)
Output:
[,1] [,2] [,3]
[1,] "137" "807" "6865"
[2,] "732" "987" "1986"
As comparison, [:digit:]
alone without the {3,4}
quantifier considers each individual number as a match.
str_extract_all(s, "[:digit:]", simplify = T)
Output:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] "1" "3" "7" "8" "0" "7" "6" "8" "6" "5"
[2,] "7" "3" "2" "9" "8" "7" "1" "9" "8" "6"