模拟数据:

df = structure(list(country = c("USA", "USA", "Japan", NA), dimension = c("economic", 
"cultural", "economic", "economic"), score = c(NA, "high", "high", 
"low")), class = "data.frame", row.names = c(NA, -4L))

我编写了这个函数来汇总每个变量的频率,然后将它们导出为CSV,文件名为该变量的名称:

export <- function(df){   
  for (col in colnames(df)) {  
    table <- df %>%
      group_by(df[col]) %>%
      summarise(Count = n()) %>% 
      mutate(Percent = Count / sum(Count)*100,
             N = sum(Count))
    write.csv(table, paste0(col, ".csv"), row.names = F)
    print(table)
  }                            
}             

export(df) --> this works

但我想在分组数据和计算频率之前删除NA.我这样做了:

export <- function(df){   
  for (col in colnames(df)) {  
    table <- df %>%
      filter(!is.na(df[col])) %>%  # Attempt to filter out NAs
      group_by(df[col]) %>%
      summarise(Count = n()) %>% 
      mutate(Percent = Count / sum(Count)*100,
             N = sum(Count))
    write.csv(table, paste0(col, ".csv"), row.names = F)
    print(table)
  }                            
}  

export(df) --> this does not work, and I get this error message:

Error in `group_by()`:
ℹ In argument: `df[col]`.
Caused by error:
! `df[col]` must be size 3 or 1, not 4.

如何删除这些Nas??我一定是犯了个愚蠢的错误.

当前输出(只显示要迭代的三个变量中的第一个):

country Count   Percent N
Japan   1       25      4
USA     2       50      4
NA      1       25      4

所需的输出(只显示了要迭代的三个变量中的第一个):

country Count   Percent     N
Japan   1       33.33333    3
USA     2       66.66667    3

注意,NA被丢弃并且不包括在频率中.

推荐答案

以下是几个选项,所有选项的结果都是一样的:

export2 <- function(df){   
  for (col in colnames(df)) {  
    table <- df %>%
      filter(if_any(all_of(col), \(x) !is.na(x))) |>
      summarise(Count = n(), .by = all_of(col)) %>% 
      mutate(Percent = Count / sum(Count)*100,
             N = sum(Count))
    #write.csv(table, paste0(col, ".csv"), row.names = F)
    print(table)
  }                            
}             

export2(df)
#   country Count  Percent N
# 1     USA     2 66.66667 3
# 2   Japan     1 33.33333 3
#   dimension Count Percent N
# 1  economic     3      75 4
# 2  cultural     1      25 4
#   score Count  Percent N
# 1  high     2 66.66667 3
# 2   low     1 33.33333 3


export3 <- function(df){   
  for (col in colnames(df)) {  
    table <- df %>%
      select(all_of(col)) |>
      na.omit() |>
      summarise(Count = n(), .by = all_of(col)) %>% 
      mutate(Percent = Count / sum(Count)*100,
             N = sum(Count))
    #write.csv(table, paste0(col, ".csv"), row.names = F)
    print(table)
  }                            
}            

export3(df)
# same as above 

export4 <- function(df){   
  for (col in colnames(df)) {  
    table <- df %>%
      select(all_of(col)) |>
      na.omit() |> 
      count(.data[[col]], name = "Count") |>
      mutate(Percent = Count / sum(Count)*100,
             N = sum(Count))
    #write.csv(table, paste0(col, ".csv"), row.names = F)
    print(table)
  }                            
}    

export4(df)
# same as above

R相关问答推荐

geom_raster不适用于x比例中超过2,15的值

从多个前置日期中获取最长日期

如何删除gggvenn与gggplot绘制的空白?

如何通过Docker部署我的shiny 应用程序(多个文件)

如何使下一个按钮只出现在Rshiny 的一段时间后?""

如何优化向量的以下条件赋值?

在R中使用Scale_y_Break后更改y轴标签

合并DFS列表并将索引提取为新列

以更少间隔的较小表中的聚合离散频率表

按多列统计频次

汇总数据的Sheffe检验的P值(平均值和标准差)

使用不同的定性属性定制主成分分析中点的 colored颜色 和形状

观察器中的inaliateLater的位置

优化从每个面的栅格中提取值

如何提取R中其他字符串和数字之间的字符串?

数值型数据与字符混合时如何进行绑定

用满足特定列匹配的另一行替换NA行

如何使用ggplot2根据绘图中生成的斜率对小平面进行排序?

R:如何在数据集中使用Apply

使用LAG和dplyr执行计算,以便按行和按组迭代