Word Boundary

A word boundary naturally occurs at the start and end of a string, and between a word and a non-word. The character \b is used to indicate a word boundary, and \B to indicate a position not being a word boundary.

For example, \\bapple\\b will match “apple” as a whole word, and not within other words like “pineapple” or “applesauce”. (Recall that a second backslash is used to escape the backslash itself; review escape character).

library(stringr)x <- c("apples in a jar",        "sweet applesource",        "a pineapple in boxes")

eg. 1. apple\\b matches “apple” at the end of a word.

str_view_all(x, pattern = "apple\\b")

Output:

[1] │ apples in a jar
[2] │ sweet applesource
[3] │ a pine<apple> in boxes
str_subset(x, pattern = "apple\\b")

Output:

[1] "a pineapple in boxes"

eg. 2. \\bapple matches “apple” at the start of a word.

str_view_all(x, pattern = "\\bapple")

Output:

[1] │ <apple>s in a jar
[2] │ sweet <apple>source
[3] │ a pineapple in boxes
str_subset(x, pattern = "\\bapple")

Output:

[1] "apples in a jar" "sweet applesource"

In comparison, ^ and $ are anchors that specify respectively the start and end position of a character vector element, not the start of a word. E.g., ^apple matches “apple” that is at the start of a string element.

str_view_all(x, pattern = "^apple")

Output:

[1] │ <apple>s in a jar
[2] │ sweet applesource
[3] │ a pineapple in boxes
str_subset(x, pattern = "^apple")

Output:

[1] "apples in a jar"