Apply a Function to Each Element of a Vector or List

TL;DR map(.x, .f) repeats the operation of function .f to each element of .x (vector, list), and reruns the output as a list of the same length as input .x. The variant functions map_lgl(), map_int(), and map_chr() work in a similar way, but returns an atomic vector of the indicated type (feasible if .f returns a vector of length one for each iteration).


Basics of map() and map_*()

The map() function applies a function to each element of a vector or list, and returns a list. For instance, map(c(1, 2, 3), ~ .x + 1) adds one to each vector element. The basic argument structure follows map(.x, .f) (as in other functions in the map_* family):

  • .x: a vector or list
  • .f: the function to be applied to each element of .x. It is in the format of a named function (without quote), e.g. mean; or an anonymous function, e.g., \(x) x + 1, ~ .x + 1, function(x) x + 1, or ~ mean(.x, na.rm = T).
  • The output is a list of the same length as .x.

If for each iteration the return of .f is a vector of length one, then you can also use the map_*() functions to return a vector of indicated type:

  • map_dbl() returns a vector of type of double (i.e., numeric)
  • map_lgl() returns a vector of type of logic
  • map_int() returns a vector of type of integer
  • map_chr() returns a vector of type of character

Repeat operation across elements of a vector

e.g.1. The following code passes 1, 5, and 10, respectively, into rnorm() as values of the mean argument.

library(purrr)library(dplyr)
map(.x = c(1, 5, 10), .f = \(x) rnorm(n = 5, mean = x, sd = .2))

Output:

[[1]]
[1] 1.1652089 1.0194722 0.8136804 1.0904337 1.0735241
[[2]]
[1] 5.157856 4.901813 5.034387 4.962213 5.376604
[[3]]
[1] 9.628559 9.480926 10.082285 10.119448 10.162851

The .f argument can be equivalently written in the following way:

map(.x = c(1, 5, 10),     .f = function(x) rnorm(n = 5, mean = x, sd = .2)) 
map(.x = c(1, 5, 10), .f = ~ rnorm(n = 5, mean = .x, sd = .2))

Of course you can rewrite the code using the pipe operator %>%.

c(1, 5, 10) %>%   map(\(x) rnorm(n = 5, mean = x, sd = .2)) 

Repeat operation across elements of a list

e.g.2 Create a list of exam scores for each student.

exam_scores <- list(  Alice = c(80, 75, 85, 90),  Bob = c(90, 85, 88, 94),  Charlie = c(70, 65, NA, NA))
exam_scores

Output:

$Alice
[1] 80 75 85 90 $Bob
[1] 90 85 88 94 $Charlie
[1] 70 65 NA NA

Calculate the average scores for each student, and return the output as a list.

exam_scores %>% map(mean, na.rm = T)
# other equivalent formsexam_scores %>% map( ~ mean(.x, na.rm = T))exam_scores %>% map( \(x) mean(x, na.rm = T))exam_scores %>% map( function(x) mean(x, na.rm = T) )

Output:

$Alice
[1] 82.5 $Bob
[1] 89.25 $Charlie
[1] 67.5

Calculate the mean score of each student.

exam_scores %>% map_dbl(\(x) mean(x, na.rm = T))

Output:

Alice Bob Charlie
82.50 89.25 67.50

Check if the average score of each student is larger than 80.

exam_scores %>% map_lgl(\(x) mean(x, na.rm = T) > 80)

Output:

Alice Bob Charlie
TRUE TRUE FALSE

Note that map_*() functions are feasible only if the .f - specified operation that is applied to each element of the list (e.g., each student) returns a vector of length-one (e.g., a single mean value).

e.g.3. Here we calculate the correlation between Sepal.Length and Sepal.Width for each iris species.

iris %>%   # split the `iris` dataset into a list of data frames  # with each element (data frame) being a subset of a 'Species' type   split(iris$Species) %>%     # For each list element (of a species type),   # create a linear model between 'Sepal.Length' and 'Sepal.Width'; output as a list  map(~ lm(Sepal.Length ~ Sepal.Width, data = .x)) %>%     # For each linear model, create a summary and extract the R2  map(summary) %>%   map_dbl(.f = "r.squared")

Output:

setosa versicolor virginica
0.5513756 0.2765821 0.2090573

An even more powerful approach is to group the dataset with nest() of the tidyr package, create a list-column of the model objects with mutate() and map(), and then extract and unfold the model parameters using the broom package. We’ll dive into more details of this powerful approach in a later tutorial.

iris %>%   # creates a 'data' column, which is a list of tibbles for each species  tidyr::nest(-Species) %>%   # create a 'model' column, which is a list of model objects  mutate(model = map(    data, ~lm(Sepal.Length ~ Sepal.Width, data = .x))) %>%   # create a 'glance' column, which is a list of model parameters,   # which are extracted using the broom package  mutate(glance = map(model, broom::glance)) %>%   # display the model parameters   tidyr::unnest(glance)

Output:

# A tibble: 3 × 15
Species data model r.squared adj.r.squared sigma statistic p.value df
<fct> <list> <lis> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 setosa <tibble> <lm> 0.551 0.542 0.239 59.0 6.71e-10 1
2 versico… <tibble> <lm> 0.277 0.262 0.444 18.4 8.77e- 5 1
3 virgini… <tibble> <lm> 0.209 0.193 0.571 12.7 8.43e- 4 1
# ℹ 6 more variables: logLik <dbl>, AIC <dbl>, BIC <dbl>, deviance <dbl>,
# df.residual <int>, nobs <int>

Repeat operation across columns of a data frame

e.g.4. The data frame is a special format of a list, with each column being an “element”. Below we calculate the mean of each column, and output the result as a named vector.

iris[, 1:4] %>% map_dbl(.f = mean)

Output:

Sepal.Length Sepal.Width Petal.Length Petal.Width
5.843333 3.057333 3.758000 1.199333

It is similar to the line below using across() to repeat operations across multiple columns, but with output as a one-row data frame.

iris[, 1:4] %>% summarise(across(everything(), mean))

Output:

Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.843333 3.057333 3.758 1.199333