我试图将向量传递给map()函数,每个向量都有不同数量的NA值,但返回错误.

我有一个由N个数字列和1个分类列组成的Tibble.我想将每个数字列的分布与按分类列的值拆分的其他列的分布进行比较.我使用overlapping::overlap()来计算分布的重叠,并将数字列提供给map_dfr函数进行迭代.例如:

require(overlapping)
require(dplyr)
require(purrr)

set.seed( 1 )
n <- 100
G1 <- sample( 0:30, size = n, replace = TRUE )
G2 <- sample( 0:30, size = n, replace = TRUE, prob = dbinom( 0:30, 31, .55 ))
G3 <- sample( 0:30, size = n, replace = TRUE, prob = dbinom( 0:30, 41, .65 ))
Data <- data.frame(y = G1, x = G2, z = G3, group = rep(c("G1","G2", "G3"), each = n), class = rep(c("C1","C2", "C3"), each = 1)) %>% as_tibble()
Data 

overlap_fcn <- function(.x) {
        ## construct list of vectors
    dist_list <- list(
                "C1" = Data %>% 
                        filter(class == 'C1', !is.na(.x)) %>% 
                        pull(.x), 
                "C2" = Data %>% 
                        filter(class == 'C2', !is.na(.x)) %>% 
                        pull(.x),
                "C3" = Data %>% 
                        filter(class == 'C3', !is.na(.x)) %>% 
                        pull(.x)
                )
## calculate distribution overlaps
    return(
        enframe(
                overlapping::overlap(dist_list)$OV*100
        ) %>% 
        mutate(value = paste0(round(value, 2), "%"),
                class = .x) %>%
        rename(comparison = name, overlap = value) %>%
        relocate(class)
    )

}

overlap_table <- purrr::map_dfr(
  .x = c('y', 'x', "z"),
  .f = ~overlap_fcn(.x))

overlap_table

以上工作达到了预期的效果.然而,在实践中,我对xyz中的每一个都有不同程度的思念.我试着用!is.na(.x)上的过滤器来解释这个问题,但它不工作.例如:

Data$x[1:3] <- NA
Data$y[10:20] <- NA
Data$z[100:150] <- NA

overlap_table <- purrr::map_dfr(
  .x = c('x', 'y', "z"),
  .f = ~overlap_fcn(.x))

返回此错误:

Error in density.default(x[[j]], n = nbins, ...): 'x' contains missing values
Error in density.default(x[[j]], n = nbins, ...): 'x' contains missing values
Traceback:
1. purrr::map_dfr(.x = c("x", "y", "z"), .f = ~overlap_fcn(.x))
2. map(.x, .f, ...)
3. .f(.x[[i]], ...)
4. overlap_fcn(.x)
5. enframe(overlapping::overlap(dist_list)$OV * 100) %>% mutate(value = paste0(round(value, 
 .     2), "%"), class = .x) %>% rename(comparison = name, overlap = value) %>% 
 .     relocate(class)   # at line 25-33 of file <text>
6. relocate(., class)
7. rename(., comparison = name, overlap = value)
8. mutate(., value = paste0(round(value, 2), "%"), class = .x)
9. enframe(overlapping::overlap(dist_list)$OV * 100)
10. overlapping::overlap(dist_list)
11. density(x[[j]], n = nbins, ...)
12. density.default(x[[j]], n = nbins, ...)
13. stop("'x' contains missing values")

有谁能帮帮我吗?我肯定我错过了一些非常明显的东西;我只是看不出是什么!

推荐答案

在这里,.x是字符类.我们可能需要转换为symbol并计算(!!)

overlap_fcn <- function(.x) {
        ## construct list of vectors
    dist_list <- list(
                "C1" = Data %>% 
                        filter(class == 'C1', !is.na(!! rlang::sym(.x)))  %>% 
                        pull(.x), 
                "C2" = Data %>% 
                         filter(class == 'C2', !is.na(!! rlang::sym(.x))) %>% 
                        pull(.x),
                "C3" = Data %>% 
                        filter(class == 'C3', !is.na(!! rlang::sym(.x)))  %>% 
                        pull(.x)
                )
## calculate distribution overlaps
    return(
        enframe(
                overlapping::overlap(dist_list)$OV*100
        ) %>% 
        mutate(value = paste0(round(value, 2), "%"),
                class = .x) %>%
        rename(comparison = name, overlap = value) %>%
        relocate(class)
    )

}

-在数据中创建NAS后进行测试

> purrr::map_dfr(
+   .x = c('x', 'y', "z"),
+   .f = ~overlap_fcn(.x))
# A tibble: 9 × 3
  class comparison overlap
  <chr> <chr>      <chr>  
1 x     C1-C2      98.61% 
2 x     C1-C3      97.46% 
3 x     C2-C3      97.5%  
4 y     C1-C2      95.47% 
5 y     C1-C3      96.22% 
6 y     C2-C3      97.14% 
7 z     C1-C2      90.17% 
8 z     C1-C3      94.9%  
9 z     C2-C3      89.24% 

R相关问答推荐

当还使用模型列表时,是否可以使用forest_mode包的面板设置?(R统计分析)

如何通过r中每20滚动和来创建组将数据视为1:10

pivot_longer:names_to和names_pattern

在交互式情节中从barplot中获取值时遇到问题,在shinly中的ggplotly

在R中,将一个函数作为输入传递给另一个函数时进行参数判断

Select 与特定列中最大值对应的数据帧行

根据shiny 应用程序中的数字输入更改图标 colored颜色

在R底座中更改白天和夜晚的背景 colored颜色

次级y轴R gggplot2

获取一个数据库框架的摘要,该数据库框架将包含一列数据库框架,

用黄土法确定区间

如何在kableextra调用cell_spec()中忽略NA?

R for循环返回到先前值

汇总数据表中两个特定列条目的值

使用列/行匹配将两个不同维度的矩阵相加

如何将SAS数据集的列名和列标签同时包含在r中GT表的表首?

如何在PDF格式的kableExtra表格中显示管道字符?

如何对r中包含特定(未知)文本的行求和?

如何使用字符串从重复的模式中提取多个数字?

构建一个6/49彩票模拟系统