Select Characters by Specified Positions

str_sub() extracts characters from the start position to the end position, which are specified with integer numbers.

x <- c("sweet", "egg", "tarts")
library(stringr)str_sub(x, start = 1, end = 3) # select 1st to 3rd character

Output:

[1] "swe" "egg" "tar"

The start and end index number can be either positive or negative. A positive integer counts from the left side of a string, and a negative integer counts from the right side of the string.

# select the 1st to 2nd-to-last characterstr_sub(x, start = 1, end = -2)

Output:

[1] "swee" "eg" "tart"
# select the 2nd-to-last to the last characterstr_sub(x, start = -2, end = -1)

Output:

[1] "et" "gg" "ts"

If the selected range of characters go beyond the limit of strings, the extra positions will be silently removed. In the following example, the 4th and 5th character is not available in “egg”, and only the 3rd character is selected.

# select the 3rd to the 5th characterstr_sub(x, start = 3, end = 5)

Output:

[1] "eet" "g" "rts"

🏆 One level up !

We can use str_sub() to modify strings, e.g., replace ‘juicy’ with ‘yummy’ in the following example.

a <- c("juicy melon")str_sub(a, 1, 5) <- "yummy"a

Output:

[1] "yummy melon"

Here we use an empty quote to remove the white space at 6th position.

str_sub(a, 6, 6) <- ""a

Output:

[1] "yummymelon"

🏵️ Expert Skill !

Position vectors can be used to select different positions for different strings. In the following example, we select the 1st to 2nd letters for “mozzarella”, and 3rd-to-last character to the last character in “smearcase”, respectively.

x <- c("mozzarella", "smearcase")str_sub(x, start = c(1, -3), end = c(2, -1))

Output:

[1] "mo" "ase"

Slightly different from str_sub(), the function str_sub_all() outputs a list of the same length as the input vector.

str_sub_all(x, start = 1, end = 2)

Output:

[[1]]
[1] "mo"
[[2]]
[1] "sm"

When the start and end positions are vectors, all specified positions are applied to each of the string elements, and output as a list. In the following code, the 1st to 2nd, 1st to 3rd, and 2nd to 4th characters are selected from each of the two strings.

str_sub_all(x, start = c(1, 1, 2), end = c(2, 3, 4)) 

Output:

[[1]]
[1] "mo" "moz" "ozz"
[[2]]
[1] "sm" "sme" "mea"