The tidyverse functions come with two ways to refer to columns in dataframes:
- tidyselect which is used for example by
select
,across
, andpivot_longer
, - tidyeval (aka. data-masking) which is used for example by
arrange
,filter
,mutate
, andsummarize
.
The programming with dplyr vignette gives a useful overview about both methods and more details can be found in the rlang and the tidyselect packages.
Here, I want to give a condensed summary how to select columns if your input variables are character vectors, quoted expression (created with vars
), or data-mask function arguments. Note that this post is written for dplyr
version 1.1.2; these semantics have changed in the past and might not be ideal in future versions.
I will use a subset of the mtcars
as example data
library(tidyverse)
df <- mtcars %>%
rownames_to_column("name") %>%
select(name, mpg, cyl) %>%
slice(1:3)
df
## name mpg cyl
## 1 Mazda RX4 21.0 6
## 2 Mazda RX4 Wag 21.0 6
## 3 Datsun 710 22.8 4
Character vectors
tidyselect
char_vec <- c("mpg", "cyl")
df %>% select(all_of(char_vec))
df %>% pivot_longer(all_of(char_vec), names_to = "feature", values_to = "value")
tidyeval
df %>% mutate(cyl_plus_10 = .data$cyl + 10)
df %>% mutate(cyl_plus_10 = !!rlang::sym(char_vec[2]) + 10)
df %>% mutate("{char_vec[2]}_plus_10" := !!rlang::sym(char_vec[2]) + 10)
df %>% mutate(across(all_of(char_vec), \(x) x * 2))
Quoted expressions
tidyselect
quoted_expr <- vars(mpg, cyl)
df %>% select(!!!quoted_expr)
df %>% pivot_longer(all_of(map_chr(quoted_expr, rlang::as_name)),
names_to = "feature", values_to = "value")
tidyeval
df %>% mutate(cyl_plus_10 = .data$cyl + 10)
df %>% mutate(cyl_plus_10 = !!quoted_expr[[2]] + 10)
df %>% mutate("{as_label(quoted_expr[[2]])}_plus_10" := !!quoted_expr[[2]] + 10)
df %>% mutate(across(all_of(map_chr(vars(cyl, mpg), rlang::as_name)), \(x) x * 2))
Data-mask function arguments
tidyselect
fnc1 <- function(arg) df %>% select({{arg}})
fnc1(mpg)
fnc2 <- function(...) df %>% select(...)
fnc2(mpg, cyl)
fnc3 <- function(args) df %>% select(!!! args)
fnc3(vars(mpg, cyl))
tidyeval
fnc1 <- function(arg) df %>% mutate(arg_plus_10 = {{arg}} + 10)
fnc1(mpg)
fnc2 <- function(arg) df %>% mutate("{{arg}}_plus_10" := {{arg}} + 10)
fnc2(mpg)
For more advanced cases, we have to use the transmute-as-bridge-pattern
datamask_to_names <- function(data, ...){
inputs <- transmute(data, ...)
names(inputs)
}
fnc3 <- function(...) df %>% mutate(across(all_of(datamask_to_names(df, ...)), \(x) x * 2))
fnc3(mpg, cyl)
fnc4 <- function(args) df %>% mutate(across(all_of(datamask_to_names(df, !!!args)), \(x) x * 2))
fnc4(vars(mpg, cyl))
fnc5 <- function(args) df %>% pivot_longer(all_of(datamask_to_names(df, !!!args)),
names_to = "feature", values_to = "value")
fnc5(vars(mpg, cyl))
Session Info
sessionInfo()
## R version 4.3.0 (2023-04-21)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Big Sur 11.7.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Europe/Berlin
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.2
## [5] purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1
## [9] ggplot2_3.4.4 tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.3 jsonlite_1.8.7 compiler_4.3.0 tidyselect_1.2.0
## [5] jquerylib_0.1.4 scales_1.2.1 yaml_2.3.7 fastmap_1.1.1
## [9] R6_2.5.1 generics_0.1.3 knitr_1.43 bookdown_0.34
## [13] munsell_0.5.0 tzdb_0.4.0 bslib_0.4.2 pillar_1.9.0
## [17] rlang_1.1.1 utf8_1.2.3 stringi_1.7.12 cachem_1.0.8
## [21] xfun_0.39 sass_0.4.6 timechange_0.2.0 cli_3.6.1
## [25] withr_2.5.0 magrittr_2.0.3 digest_0.6.31 grid_4.3.0
## [29] rstudioapi_0.14 hms_1.1.3 lifecycle_1.0.3 vctrs_0.6.2
## [33] evaluate_0.21 glue_1.6.2 blogdown_1.17 fansi_1.0.4
## [37] colorspace_2.1-0 rmarkdown_2.22 tools_4.3.0 pkgconfig_2.0.3
## [41] htmltools_0.5.5