我在R工作.
我有一所学校教职员工的一些数据:
data <- data.frame(person_id = c(1, 2, 3, 4, 5, 6, 7, 8),
disability_status = c("yes", "no", "yes", "no", "yes", "no", "yes", "no"),
age_group = c("20-30","30-40","20-30","30-40","20-30","30-40","20-30","30-40"),
teacher = c("yes", "no", "no", "yes", "no","yes", "no", "yes" ))
我已经编写了一个函数,它可以在插入的变量中创建求和."group_tag"参数是为了帮助以后在我的代码中进行调试.
group_the_data <- function(data,
variable,
group_tag) {
grouped_output <- data %>%
mutate(flag = 1) %>%
group_by({{variable}}) %>%
summarise(number_staff = sum(flag, na.rm = T)) %>%
mutate(grouping_tag := {{group_tag}})
return(grouped_output)
}
然后,我使用该函数依次按残障状态、年龄组和教师分组:
disability_grouped <- group_the_data(data = data,
variable = disability_status,
group_tag = "disability status")
age_group_grouped <- group_the_data(data = data,
variable = age_group,
group_tag = "age group")
role_grouped <- group_the_data(data = data,
variable = teacher,
group_tag = "role")
一旦我有了我需要的数据帧,我就把它们绑定在一起:
all_data_grouped <- bind_rows(disability_grouped, age_group_grouped, role_grouped)
有没有一种方法可以遍历变量,这样我就不需要写出三次函数了?
或者使用Apply函数之一是不是更好的主意?