library(stringr)<- c("apples in a jar", x "sweet applesource", "a pineapple in boxes")
Word Boundary
A word boundary naturally occurs at the start and end of a string, and between a word and a non-word. The character \b
is used to indicate a word boundary, and \B
to indicate a position not being a word boundary.
For example, \\bapple\\b
will match “apple” as a whole word, and not within other words like “pineapple” or “applesauce”. (Recall that a second backslash is used to escape the backslash itself; review escape character).
eg. 1. apple\\b
matches “apple” at the end of a word.
str_view_all(x, pattern = "apple\\b")
Output:
[1] │ apples in a jar
[2] │ sweet applesource
[3] │ a pine<apple> in boxes
str_subset(x, pattern = "apple\\b")
Output:
[1] "a pineapple in boxes"
eg. 2. \\bapple
matches “apple” at the start of a word.
str_view_all(x, pattern = "\\bapple")
Output:
[1] │ <apple>s in a jar
[2] │ sweet <apple>source
[3] │ a pineapple in boxes
str_subset(x, pattern = "\\bapple")
Output:
[1] "apples in a jar" "sweet applesource"
In comparison, ^
and $
are anchors that specify respectively the start and end position of a character vector element, not the start of a word. E.g., ^apple
matches “apple” that is at the start of a string element.
str_view_all(x, pattern = "^apple")
Output:
[1] │ <apple>s in a jar
[2] │ sweet applesource
[3] │ a pineapple in boxes
str_subset(x, pattern = "^apple")
Output:
[1] "apples in a jar"