模拟数据:
df = structure(list(country = c("USA", "USA", "Japan", NA), dimension = c("economic",
"cultural", "economic", "economic"), score = c(NA, "high", "high",
"low")), class = "data.frame", row.names = c(NA, -4L))
我编写了这个函数来汇总每个变量的频率,然后将它们导出为CSV,文件名为该变量的名称:
export <- function(df){
for (col in colnames(df)) {
table <- df %>%
group_by(df[col]) %>%
summarise(Count = n()) %>%
mutate(Percent = Count / sum(Count)*100,
N = sum(Count))
write.csv(table, paste0(col, ".csv"), row.names = F)
print(table)
}
}
export(df) --> this works
但我想在分组数据和计算频率之前删除NA.我这样做了:
export <- function(df){
for (col in colnames(df)) {
table <- df %>%
filter(!is.na(df[col])) %>% # Attempt to filter out NAs
group_by(df[col]) %>%
summarise(Count = n()) %>%
mutate(Percent = Count / sum(Count)*100,
N = sum(Count))
write.csv(table, paste0(col, ".csv"), row.names = F)
print(table)
}
}
export(df) --> this does not work, and I get this error message:
Error in `group_by()`:
ℹ In argument: `df[col]`.
Caused by error:
! `df[col]` must be size 3 or 1, not 4.
如何删除这些Nas??我一定是犯了个愚蠢的错误.
当前输出(只显示要迭代的三个变量中的第一个):
country Count Percent N
Japan 1 25 4
USA 2 50 4
NA 1 25 4
所需的输出(只显示了要迭代的三个变量中的第一个):
country Count Percent N
Japan 1 33.33333 3
USA 2 66.66667 3
注意,NA被丢弃并且不包括在频率中.