Escape Characters

An escape character is a character that indicates that its following character(s) should be interpreted differently (escaping from its original meaning). Backslash \ is the most common escape character.

Escape a special character

For a special character to be a matching pattern (i.e., to be treated as a literal character), it has to be immediately escaped (preceded) by a backslash \. For example:

  • \( and \) separately matches the left and right literal parenthesis.
  • \[ and \] separately matches the left and right literal square bracket.
  • \. is treated as a dot itself, instead of a wildcard.
  • \^ and \$ is treated respectively as a literal carat and dollar sign, instead of a position anchor.

Since the backslash itself is a special character, it needs to be escaped with another backslash to be interpreted literally, e.g., using \\., \\^, and \\$, and \\(.

eg.1. \\. matches a literal dot, and .* matches a string of any length (here the dot is a wildcard). Thus, \\..* matches a literal dot and its following characters, i.e., the file extension.

library(stringr)x <- c("raw_data.xlsx", "data_analysis.RData")
str_view_all(x, "\\..*")

Output:

[1] │ raw_data<.xlsx>
[2] │ data_analysis<.RData>
str_extract(x, "\\..*")

Output:

[1] ".xlsx" ".RData"

Special characters with characer class

When special characters are used with character class (within a pair of square brackets), they are interpreted literally, and does not need the backslash to escape.

eg.2. [$^*] matches “$”, “^”, and “*” as literal characters.

s <- c("an book $", "carot or carat ^", "stars ** in the sky")
str_view_all(s, "[$^*]")

Output:

[1] │ an book <$>
[2] │ carot or carat <^>
[3] │ stars <*><*> in the sky
str_extract(s, "[$^*]")

Output:

[1] "$" "^" "*"

Escape a regular letter

As demonstrated above, when a special character is escaped (preceded) with a backslash \, it is interpreted literally as a character itself. On the other hand, an ordinary letter can be escaped to convey a different meaning:

  • \d matches a single digit
  • \D matches a single non-digit
  • \w matches a word character (alphanumeric + underscore)
  • \W matches a non-word character
  • \s matches any whitespace
  • \S matches a non-whitespace
  • \b matches a word boundary
  • \B matches a position that is not a word boundary
  • \t matches a tab character
  • \n matches a newline character

Again, a second backslash is needed to escape itself, e.g., using \\S.

Consider the following examples.

eg.3. \\$ matches a literal dollar sign, and \\d+ matches one or more digits. As such, \\$\\d+ matches a dollar amount.

d <- c("book of $123", "price at 20% off")
str_view_all(d, "\\$\\d+")

Output:

[1] │ book of <$123>
[2] │ price at 20% off
str_extract(d, "\\$\\d+")

Output:

[1] "$123" NA

eg.4. \\d{3}\\. matches three consecutive digits, followed with a literal dot. As such, \\d{3}\\.\\d{3}\\.\\d{4} matches a phone number in the form of xxx.xxx.xxxx.

a <- c("Bob: 787.902.1068", "Mike: 910.087.1483")p <- "\\d{3}\\.\\d{3}\\.\\d{4}"
str_view_all(a, p)

Output:

[1] │ Bob: <787.902.1068>
[2] │ Mike: <910.087.1483>
str_extract(a, p)

Output:

[1] "787.902.1068" "910.087.1483"