我有一个情况,我有三个数据帧.使用下面的虚拟数据,数据帧设置如下:
-
Df有一个ID变量和多个附加变量
-
DF1有一个ID变量来匹配df和varX_J的信息,其中X 是00:19(作为字符),J是变量的描述 名字.对于所有变量,前三个字母保持相同(var
-
Df2与df1相同,但信息不同.
我需要用df合并df1和df2,同时合并列中的数据.Df1和df2有相同的观察结果.它们should具有不同的信息(例如,如果在df1中的var09_marted中有ID 1的值,那么在df2中的同一单元格中就不应该有信息.然而,数据是杂乱的,可能有一些地方不是这样的.
要创建这个虚拟数据,我有以下脚本:
library('dplyr')
df <- data.frame(id = c(1:20),
og_var1 = sample(c(1:50), 20, replace=TRUE),
state = sample(c(1:52), 20, replace=TRUE),
race = sample(c(1:5), 20, replace=TRUE)
)
df1 <- left_join(data.frame(id = (1:20)), data.frame(
id = c(3,6,9,12),
var09_married = c(1,NA,2,1),
var09_happiness = c(1,NA,3,2),
var10_married = c(NA,1,2,2),
var10_happiness = c(NA,5,2,5)), by=c("id"))
df2 <- left_join(data.frame(id = (1:20)), data.frame(
id = c(3,6,11,15),
var09_married = c(NA,1,1,1),
var09_happiness = c(NA,3,3,2),
var10_married = c(1,NA,2,1),
var10_happiness = c(2,NA,4,4)), by=c("id"))
df <- left_join(df, df1, by=c("id"))
df <- left_join(df, df2, by=c("id"))
我想要的是在不复制列的情况下将这些信息合并在一起.如果df1和df2中的信息在同一位置(例如,id3在df1和df2中都有var10的信息),那么我希望在最终的数据帧中有来自df1的信息.但如果此信息被删除,我也想创建一个旗帜.因此,最终的数据帧应该如下所示:
dput(df)
structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20), og_var1 = c(6L, 4L, 33L, 7L,
37L, 16L, 34L, 42L, 37L, 37L, 39L, 41L, 24L, 33L, 30L, 2L, 20L,
29L, 33L, 47L), state = c(2L, 35L, 11L, 14L, 16L, 16L, 40L, 39L,
28L, 13L, 5L, 26L, 28L, 15L, 13L, 31L, 43L, 25L, 16L, 28L), race = c(5L,
4L, 2L, 1L, 1L, 2L, 3L, 2L, 2L, 4L, 2L, 3L, 5L, 2L, 3L, 2L, 5L,
1L, 5L, 5L), var09_married = c(NA, NA, 1, NA, NA, 1, NA, NA,
2, NA, 1, 1, NA, NA, 1, NA, NA, NA, NA, NA), var09_happiness = c(NA,
NA, 1, NA, NA, 3, NA, NA, 3, NA, 3, 2, NA, NA, 2, NA, NA, NA,
NA, NA), var10_married = c(NA, NA, 1, NA, NA, 1, NA, NA, 2, NA,
2, 2, NA, NA, 1, NA, NA, NA, NA, NA), var10_happiness = c(NA,
NA, 2, NA, NA, 5, NA, NA, 2, NA, 4, 5, NA, NA, 4, NA, NA, NA,
NA, NA), flag = c(0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0)), row.names = c(NA, -20L), class = "data.frame")