如何在函数中使用 dplyr group_by，以标志为条件

发布于07月07日

我想定义一个自定义函数，该函数使用dplyr对一些数据进行分组和总结，并且以布尔标志为条件可以按额外级别分组.我可以用一个完整的if...else控制块，如本例所示:

library(tidyverse)
data(Titanic)

Titanic <- as_tibble(Titanic)

foo <- function(by_age = FALSE) {
  if (by_age) {
    bar <- Titanic %>%
      group_by(Survived, Age)
  } else {
    bar <- Titanic %>%
      group_by(Survived)
  }
  
  bar %>%
    summarise(n = sum(n))
}

foo()
foo(by_age = TRUE)

但这似乎是一种非常笨拙的方法.Is there a way I can achieve this with a single block of dplyr code, conditionally calling Age as a second grouping variable?我在我的group_by声明中try 了ifelse(by_age, Age, NA)，在this SO post中列出了一些技巧，但没有用.

编辑

对不起，我没有读到你的帖子；如果出于某种原因想要避免...方法，这是一个潜在的解决方案:

library(tidyverse)
data(Titanic)

Titanic <- as_tibble(Titanic)

foo <- function(by_age = FALSE) {
  Titanic %>%
    group_by(Survived, if(by_age) Age) %>%
    summarise(n = sum(n))
}

foo()
#> # A tibble: 2 × 2
#>   Survived     n
#>   <chr>    <dbl>
#> 1 No        1490
#> 2 Yes        711
foo(by_age = TRUE)
#> `summarise()` has grouped output by 'Survived'. You can override using the
#> `.groups` argument.
#> # A tibble: 4 × 3
#> # Groups:   Survived [2]
#>   Survived `if (by_age) Age`     n
#>   <chr>    <chr>             <dbl>
#> 1 No       Adult              1438
#> 2 No       Child                52
#> 3 Yes      Adult               654
#> 4 Yes      Child                57

^{由reprex package(v2.0.1)于2022-07-07创建}

为了避免"年龄"列被称为"if(by_Age)Age"，您可以使用:

library(tidyverse)
data(Titanic)

Titanic <- as_tibble(Titanic)

foo <- function(by_age = FALSE) {
  Titanic %>%
    group_by(Survived, !!sym(ifelse(by_age, "Age", ""))) %>%
    summarise(n = sum(n))
}

foo()
#> # A tibble: 2 × 2
#>   Survived     n
#>   <chr>    <dbl>
#> 1 No        1490
#> 2 Yes        711
foo(by_age = TRUE)
#> `summarise()` has grouped output by 'Survived'. You can override using the
#> `.groups` argument.
#> # A tibble: 4 × 3
#> # Groups:   Survived [2]
#>   Survived Age       n
#>   <chr>    <chr> <dbl>
#> 1 No       Adult  1438
#> 2 No       Child    52
#> 3 Yes      Adult   654
#> 4 Yes      Child    57

^{由reprex package(v2.0.1)于2022-07-07创建}

原始答案

一种解决方案是在需要时使用... (dot-dot-dot)传递参数，例如.

library(tidyverse)
data(Titanic)

Titanic <- as_tibble(Titanic)

foo <- function(...) {
  Titanic %>%
      group_by(Survived, ...) %>%
    summarise(n = sum(n))
}

foo()
#> # A tibble: 2 × 2
#>   Survived     n
#>   <chr>    <dbl>
#> 1 No        1490
#> 2 Yes        711
foo(Age)
#> `summarise()` has grouped output by 'Survived'. You can override using the
#> `.groups` argument.
#> # A tibble: 4 × 3
#> # Groups:   Survived [2]
#>   Survived Age       n
#>   <chr>    <chr> <dbl>
#> 1 No       Adult  1438
#> 2 No       Child    52
#> 3 Yes      Adult   654
#> 4 Yes      Child    57

# You can also pass in multiple 'extra' arguments
foo(Age, Sex)
#> `summarise()` has grouped output by 'Survived', 'Age'. You can override using
#> the `.groups` argument.
#> # A tibble: 8 × 4
#> # Groups:   Survived, Age [4]
#>   Survived Age   Sex        n
#>   <chr>    <chr> <chr>  <dbl>
#> 1 No       Adult Female   109
#> 2 No       Adult Male    1329
#> 3 No       Child Female    17
#> 4 No       Child Male      35
#> 5 Yes      Adult Female   316
#> 6 Yes      Adult Male     338
#> 7 Yes      Child Female    28
#> 8 Yes      Child Male      29

^{由reprex package(v2.0.1)于2022-07-07创建}

注意:使用...有两个缺点:

当您使用它将参数传递给另一个函数时，您必须仔细地向用户解释这些参数的go 向.这使得您很难理解使用lappy()和plot()等函数可以做什么.
拼写错误的参数不会引发错误.这使得打字容易被忽视(从高级R；https://adv-r.hadley.nz/functions.html?q=...#fun-dot-dot-dot)