library(stringr)<- c("... Hello", "Hello world") x
Specify the Start or End of Strings with Anchors
Anchors define the position of a string where a match is supposed to be found: ^
denotes the start of the string, and $
denotes the end of the string. Consider the following examples.
Hello$
matches “Hello” that resides at the very end of a string.
str_view_all(x, "Hello$")
Output:
[1] │ ... <Hello>
[2] │ Hello world
str_extract_all(x, "Hello$", simplify = T)
Output:
[,1]
[1,] "Hello"
[2,] ""
^[aA]pple
matches “Apple” and “apple” that reside at the start of a string.
<- c("apples are red", "Apples are juicy", "We love apples") a
str_view_all(a, "^[aA]pple")
Output:
[1] │ <apple>s are red
[2] │ <Apple>s are juicy
[3] │ We love apples
str_extract_all(a, "^[aA]pple", simplify = T)
Output:
[,1]
[1,] "apple"
[2,] "Apple"
[3,] ""
It is important to note that when the caret ^
is used inside the square brackets []
, it has a different meaning: it negates the character class, and no longer functions as the starting anchor, e.g., [^123]
matches any character except “1”, “2”, or “3”.
Compare the following two examples.
<- c("12a 53b", "b**89 *_...") x
Match consecutive digits 1 to 5 at the start of a string.
str_view_all(x, "^[1-5]+")
Output:
[1] │ <12>a 53b
[2] │ b**89 *_...
str_extract_all(x, "^[1-5]+", simplify = T)
Output:
[,1]
[1,] "12"
[2,] ""
Match any consecutive characters excluding 1 to 5.
str_view_all(x, "[^1-5]+")
Output:
[1] │ 12<a >53<b>
[2] │ <b**89 *_...>
str_extract_all(x, "[^1-5]+", simplify = T)
Output:
[,1] [,2]
[1,] "a " "b"
[2,] "b**89 *_..." ""