A Comprehensive Guide to Regular Expressions

Regular expressions (RegEx) are a sequence of characters that define a search pattern. They are widely used in programming for pattern matching and text manipulation. Below is a summary of commonly used RegEx.

PatternDescription
.Match any character; a wildcard
[abc]Match any character a, b, or c
[^abc]Match any character except a, b, or c
[a-z]Match any lowercase letter
[:digit:]Match any digit (0-9)
[:alpha:]Match any alphabetic character (uppercase or lowercase)
[:lower:]Match any lowercase alphabetic letter
[:upper:]Match any uppercase alphabetic letter
[:alnum:]Match any alphanumeric character (letter or digit)
[:space:]Match a single whitespace character
[:punct:]Match any punctuation character
[:print:]Match any printable character (alphanumeric and punctuation)
\dMatch any digit
\DMatch any non-digit
\wMatch any word character
\WMatch any non-word character
\sMatch any whitespace character
\SMatch any non-whitespace character
{n}Match exactly n occurrences
*Match zero or more occurrences
+Match one or more occurrences
?Match zero or one occurrence
^Anchor for the start of a line
$Anchor for the end of a line
(...)Match a group of characters
\bMatch a word boundary
A(?=B)Positive lookahead assertion
(?<=B)APositive lookbehind assertion
A(?!B)Negative lookahead assertion
(?<!B)ANegative lookbehind assertion
^\bapple\bMatch ‘apple’ at the beginning of a line
\bapple\b$Match ‘apple’ at the end of a line
\bapple\bMatch the whole word ‘apple’
\b\d\bMatch a standalone digit
\b(\w+)\bMatch and capture any whole word

Note: For characters escaped by backslash, remember to type a second backslash in practice to render the first one as a literal backslash.


We’ll dive into details of different types of expressions in the following chapters. You’ll learn RegEx from enriched examples. We’ll use str_view_all() to highlight the matched pattern itself, and further demonstrate its use with functions str_extract() and str_extract_all() from stringr, a popular R package (part of tidyverse) for string manipulation.