我有同一数据集的5个版本,即,所有版本都有相同的列和行,名称相同.然而,它们包含不同单元格的值,所以只有当我可以组合所有单元格时,我才能得到所有数据.

下面是一个例子:

dataset1 <- as.data.frame(matrix(c("1", "1", "NA", "NA", "NA", "NA", "A", "NA", "B", "B", "A", "B"), ncol = 2))
colnames(dataset1) = c("Patient", "Treatment")
dataset2 <- as.data.frame(matrix(c("1", "1", "2", "4", "3", "NA", "A", "NA", "B", "B", "A", "B"), ncol = 2))
colnames(dataset2) = c("Patient", "Treatment")
dataset3 <- as.data.frame(matrix(c("1", "1", "NA", "NA", "NA", "NA", "A", "NA", "B", "B", "A", "B"), ncol = 2))
colnames(dataset3) = c("Patient", "Treatment")
dataset4 <- as.data.frame(matrix(c("1", "1", "NA", "2", "NA", "NA", "A", "NA", "B", "B", "A", "B"), ncol = 2))
colnames(dataset4) = c("Patient", "Treatment")
dataset5 <- as.data.frame(matrix(c("1", "1", "NA", "2", "NA", "2", "A", "C", "B", "B", "A", "B"), ncol = 2))
colnames(dataset5) = c("Patient", "Treatment")

我想以某种方式组合这5个数据集,以便数据集1中的任何单元格都被替换为数据集2中的值,如果这些单元格有效,或者如果数据集3中的单元格无效,依此类推,这样示例数据的结果将如下所示:

dataset_complete <- as.data.frame(matrix(c("1", "1", "2", "4", "3", "2", "A", "C", "B", "B", "A", "B"), ncol = 2))
colnames(dataset_complete) = c("Patient", "Treatment")

有没有一种自动的方法来做到这一点?我试着阅读关于连接转换(https://r4ds.hadley.nz/joins.html),但没有找到解决方案.

亲切问候

推荐答案

这可以用reduction来做.既然你提到了r4ds,我将使用dplyr::coalesce来处理替换NA个值. "NA"是一个字符串文字,而不是R的NA.我猜你指的是后者.

dataset1 <- as.data.frame(matrix(c("1", "1", NA, NA, NA, NA, "A", NA, "B", "B", "A", "B"), ncol = 2))
colnames(dataset1) = c("Patient", "Treatment")
dataset2 <- as.data.frame(matrix(c("1", "1", "2", "4", "3", NA, "A", NA, "B", "B", "A", "B"), ncol = 2))
colnames(dataset2) = c("Patient", "Treatment")
dataset3 <- as.data.frame(matrix(c("1", "1", NA, NA, NA, NA, "A", NA, "B", "B", "A", "B"), ncol = 2))
colnames(dataset3) = c("Patient", "Treatment")
dataset4 <- as.data.frame(matrix(c("1", "1", NA, "2", NA, NA, "A", NA, "B", "B", "A", "B"), ncol = 2))
colnames(dataset4) = c("Patient", "Treatment")
dataset5 <- as.data.frame(matrix(c("1", "1", NA, "2", NA, "2", "A", "C", "B", "B", "A", "B"), ncol = 2))
colnames(dataset5) = c("Patient", "Treatment")

因为你参考了Hadley Wickham的R4DS,我建议使用dplyr::coalescepurrr::map2_dfc,虽然它们不是必需的,基本R版本也很容易.

现在是解决方案:

Reduce(function(prev, this) purrr::map2_dfc(prev, this, .f = dplyr::coalesce), 
       list(dataset1, dataset2, dataset3, dataset4, dataset5))
# # A tibble: 6 × 2
#   Patient Treatment
#   <chr>   <chr>    
# 1 1       A        
# 2 1       C        
# 3 2       B        
# 4 4       B        
# 5 3       A        
# 6 2       B        

帧的顺序很重要.一旦一个"细胞"是非NA,之后的任何东西都将被忽略(我认为这是意图).例如,如果我们颠倒帧的顺序,我们会得到一个稍微不同的结果(在Patient中):

Reduce(function(prev, this) purrr::map2_dfc(prev, this, .f = coalesce), list(dataset5, dataset4, dataset3, dataset2, dataset1))
# # A tibble: 6 × 2
#   Patient Treatment
#   <chr>   <chr>    
# 1 1       A        
# 2 1       C        
# 3 2       B        
# 4 2       B        
# 5 3       A        
# 6 2       B        

R相关问答推荐

有没有方法将paste 0功能与列表结合起来?

根据shiny 应用程序中的数字输入更改图标 colored颜色

如何将在HW上运行的R中的消息(错误、警告等)作为批处理任务输出

根据R中的另一个日期从多列中 Select 最近的日期和相应的结果

以R中的正确顺序将日期时间字符列转换为posixct

terra nearest()仅为所有`to_id`列返回NA

在嵌套列表中查找元素路径的最佳方法

将. xlsx内容显示为HTML表

DEN扩展包中的RECT树形图出现异常行为

给定开始日期和月份(数字),如何根据R中的开始日期和月数创建日期列

当我们有多个反斜杠和/特殊字符时使用Gsubing

您是否可以使用facet_rap设置一个较低的限制来对ggmap上的比例中断进行zoom ?

在保留列表元素属性的同时替换列表元素

如何在反曲线图中更改X标签

如果满足条件,则替换列的前一个值和后续值

将美学添加到ggploy中的文本标签

使用列名和r中的前缀 Select 列的CREATE函数

将Geojson保存为R中的shapefile

如何使用ggsurvfit包更改风险表中的标签名称?

臭虫?GradeThis::grade_this_code()在`-code-check`块中失败