我有一个数据框,其中对于每一行,我想随机抽样三列(其中三列可以因行而异),并取这三个采样值的平均值.作为一个额外的问题,我有许多行完全是NA的(由于其他原因,我不能删除它们),或者只包含1到2个非NA值.基于this question and answer分,我try 了以下几种方法:
df_new <- df %>%
rowwise %>%
mutate(inflo_mean = mean(sample(na.omit(c_across(everything())), 3)))
这不起作用,我得到一个关于使用sample()
的错误:
Error in `mutate()`:
ℹ In argument: `inflo_mean = mean(sample(na.omit(c_across(everything())), 3))`.
ℹ In row 1.
Caused by error in `sample.int()`:
! invalid first argument
然后我试着把它分解成更小的步骤,分别处理不同的NA病例,并得出以下结论:
df_new2 <- df %>%
rowwise() %>%
mutate(num_NAs = sum(!is.na(across(starts_with("Col_")))),
v_inflo = list(na.omit(c_across((starts_with("Col_"))))),
inflo_mean = case_when(num_NAs > 2 ~ mean(sample(v_inflo, 3)),
num_NAs == 2 ~ mean(v_inflo),
num_NAs == 1 ~ as.numeric(v_inflo),
num_NAs == 0 ~ NA_real_,
TRUE ~ NA_real_))
同样,这也不起作用,我得到了相同的错误.我判断了列的数据类型,它们都是整数.这可能是什么问题呢?或者还有其他解决方案吗?
示例数据:
> dput(df)
structure(list(Col_1 = c(NA, 77L, 82L, 172L), Col_2 = c(NA, 79L,
NA, 135L), Col_3 = c(NA, 81L, NA, 131L), Col_4 = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_), Col_5 = c(NA, NA, NA,
33L), Col_6 = c(NA, NA, NA, 104L), Col_7 = c(NA, NA, NA, 106L
), Col_8 = c(NA, NA, NA, 93L), Col_9 = c(NA, NA, NA, 50L), Col_10 = c(NA,
NA, NA, 48L), Col_11 = c(NA, NA, NA, 96L), Col_12 = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_), Col_13 = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_), Col_14 = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_), Col_15 = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -4L))