我有一个纵向数据集,包含不同社会经济地位(SE)的个体,分为4个类别,高、中、低、中和低.对于一些分析,我只想显示中低收入组的样本量,如果both个中低收入组在当月的观察中至少有5个人.否则,我希望它显示为NA.
我认为这个代码可以工作,但它不能.它应该在1月份给NA中低端组的"adjusted_total"列,但将其保留为2月份的当前值(40).它无法完成前者,但完成了后者:
这是我的示例数据集,try 使用dplyr的case\u when()获得我想要的:
library(dplyr)
#Sample dataset
test_data <- tibble(month = c(rep(c("Jan"), 4), rep(c("Feb"), 4)),
ses = c(rep(c("High", "Mid", "Mid Low", "Low"), 2)),
total = c(10, 20, 4, 30, 9, 11, 40, 60),
total_selected = c(9, 10, 8, 3, 8, 6, 8, 6))
#Failed attempt
wrong <- test_data %>%
group_by(month) %>%
mutate(adjusted_total = case_when(
ses == "Mid Low" & total[ses == "Mid"] <5 | total[ses == "Low"] <5 ~ NA_real_,
TRUE ~ total
))
EDIT WITH SOLUTION
我意识到我的代码有一个拼写错误.首先,我指的是or语句,而不是AND.其次,阈值对于我的数据来说太低了.当我调整到OR语句,并且截止到15时
correct <- tibble(month = c(rep(c("Jan"), 4), rep(c("Feb"), 4)),
ses = c(rep(c("High", "Mid", "Mid Low", "Low"), 2)),
total = c(10, 20, 4, 30, 9, 11, 40, 60),
total_selected = c(9, 10, 8, 3, 8, 6, 8, 6)) %>%
group_by(month) %>%
mutate(adjusted_total = case_when(
ses == "Mid Low" & total[ses == "Mid"] < 15 | total[ses == "Low"] < 15 ~ NA_real_,
TRUE ~ total
))