<- c("sweet", "egg", "tarts") x
library(stringr)str_sub(x, start = 1, end = 3) # select 1st to 3rd character
Output:
[1] "swe" "egg" "tar"
str_sub()
extracts characters from the start
position to the end
position, which are specified with integer numbers.
<- c("sweet", "egg", "tarts") x
library(stringr)str_sub(x, start = 1, end = 3) # select 1st to 3rd character
Output:
[1] "swe" "egg" "tar"
The start
and end
index number can be either positive or negative. A positive integer counts from the left side of a string, and a negative integer counts from the right side of the string.
# select the 1st to 2nd-to-last characterstr_sub(x, start = 1, end = -2)
Output:
[1] "swee" "eg" "tart"
# select the 2nd-to-last to the last characterstr_sub(x, start = -2, end = -1)
Output:
[1] "et" "gg" "ts"
If the selected range of characters go beyond the limit of strings, the extra positions will be silently removed. In the following example, the 4th and 5th character is not available in “egg”, and only the 3rd character is selected.
# select the 3rd to the 5th characterstr_sub(x, start = 3, end = 5)
Output:
[1] "eet" "g" "rts"
We can use str_sub()
to modify strings, e.g., replace ‘juicy’ with ‘yummy’ in the following example.
<- c("juicy melon") a str_sub(a, 1, 5) <- "yummy" a
Output:
[1] "yummy melon"
Here we use an empty quote to remove the white space at 6th position.
str_sub(a, 6, 6) <- "" a
Output:
[1] "yummymelon"
Position vectors can be used to select different positions for different strings. In the following example, we select the 1st to 2nd letters for “mozzarella”, and 3rd-to-last character to the last character in “smearcase”, respectively.
<- c("mozzarella", "smearcase") x str_sub(x, start = c(1, -3), end = c(2, -1))
Output:
[1] "mo" "ase"
Slightly different from str_sub()
, the function str_sub_all()
outputs a list of the same length as the input vector.
str_sub_all(x, start = 1, end = 2)
Output:
[[1]]
[1] "mo"
[[2]]
[1] "sm"
When the start
and end
positions are vectors, all specified positions are applied to each of the string elements, and output as a list. In the following code, the 1st to 2nd, 1st to 3rd, and 2nd to 4th characters are selected from each of the two strings.
str_sub_all(x, start = c(1, 1, 2), end = c(2, 3, 4))
Output:
[[1]]
[1] "mo" "moz" "ozz"
[[2]]
[1] "sm" "sme" "mea"