我们可以在数据中创建一个ROW NAMES列,并将其用于连接
library(dplyr)
library(tibble)
library(purrr)
library(tidyr)
list(df1, df2, df3, df4) %>%
map(~ .x %>% rownames_to_column('rn')) %>%
reduce(full_join, by = "rn") %>%
mutate(across(-rn, replace_na, 0)) %>%
column_to_rownames('rn')
-输出
value_1 value_2 value_3 value_4
A 1 1 1 5
B 2 2 0 0
C 3 3 0 6
D 4 4 2 7
E 0 5 3 8
by = 0
或by = "row.names"
适用于第一个连接,但在第一个合并之后,row.name将是一列
> merge(df1, df2, by = "row.names", all = TRUE)
Row.names value_1 value_2
1 A 1 1
2 B 2 2
3 C 3 3
4 D 4 4
5 E NA 5
因此,它不会起作用.我们可以创建一列,然后进行合并
Reduce(\(x, y) merge(x, y, by = 'rn', all = TRUE),
lapply(list(df1, df2, df3, df4), \(x) transform(x,
rn = row.names(x))))
rn value_1 value_2 value_3 value_4
1 A 1 1 1 5
2 B 2 2 NA NA
3 C 3 3 NA 6
4 D 4 4 2 7
5 E NA 5 3 8
或在基地R |>
中
list(df1, df2, df3, df4) |>
lapply(\(x) transform(x, rn = row.names(x))) |>
Reduce(\(x, y) merge(x, y, all = TRUE), x = _)
rn value_1 value_2 value_3 value_4
1 A 1 1 1 5
2 B 2 2 NA NA
3 C 3 3 NA 6
4 D 4 4 2 7
5 E NA 5 3 8
或者,另一种 Select 是首先在前两个数据集之间进行连接,将其保存在列表中,然后使用by.x
和by.y
list(merge(df1, df2, by = "row.names", all = TRUE), df3, df4) |>
Reduce(\(x, y) merge(x, y, by.x = "Row.names",
by.y = "row.names", all = TRUE), x = _)
Row.names value_1 value_2 value_3 value_4
1 A 1 1 1 5
2 B 2 2 NA NA
3 C 3 3 NA 6
4 D 4 4 2 7
5 E NA 5 3 8
如果我们不想单独连接前两个数据集,那么创建一个函数来动态判断"Row.name"列是否存在,并相应地更改by.x
和by.y
f1 <- function(x, y)
{
i1 <- any(grepl("Row.names", names(x)))
i2 <- any(grepl("Row.names", names(y)))
nm1 <- if(i1) "Row.names"else "row.names"
nm2 <- if(i2) "Row.names" else "row.names"
merge(x, y, by.x = nm1, by.y = nm2 , all = TRUE)
}
list(df1, df2, df3, df4) |>
Reduce(f1, x= _)
Row.names value_1 value_2 value_3 value_4
1 A 1 1 1 5
2 B 2 2 NA NA
3 C 3 3 NA 6
4 D 4 4 2 7
5 E NA 5 3 8