我有两个DFS,它们在不同的时间点对同一组进行了采样,但在不同的时间点上有一些损耗和一些变量的添加/省略.(底部生成样本数据的代码)
我想根据df1中的ID(一个CHR变量)是否也出现在df2中来计算波之间的消耗率,并在df1中创建一个二进制列,指示ID是否出现在df2中.
Df1和df2的格式如下
id var_1 var_2 ...
5ea1954758634f04542a50bf 5 1
3fjkfgho55efy467grtu523r 7 4
df79756gh5485trhfsdkig3d 3 8
我想要的输出是:
df1
id in _df2 var_1 var_2 ...
5ea1954758634f04542a50bf Yes 5 1
3fjkfgho55efy467grtu523r Yes 4 2
df79756gh5485trhfsdkig3d No 8 3
我try 使用此代码,但似乎不起作用(df1中没有生成任何列):
df1 %>%
mutate(in_df2 = c("no", "yes")[1 + (rowSums(
outer(
strsplit(id, "\\s+"),
strsplit(df2$id, "\\s+"),
Vectorize(function(x, y) all(x %in% y) | all(y %in% x))
)
) > 0)])
生成示例DFS的代码:
#df1
df1 <- structure(list(id = c("5ea1954758634f04542a50bf", "3fjkfgho55efy467grtu523r", "df79756gh5485trhfsdkig3d",
"d6rg756ghuej4678dfdkig3d", "546dt547546hgdvc842a50bf"), var1 = 1:5, var2 = 3:7), row.names = c(
-6L), class = "data.frame")
#df2
df2 <- structure(list(id = c("73egdv4758634f04542a50bf", "3fjkfgho55efy467grtu523r", "tr54756gh5485trhfsdkig3d",
"d6rg756ghuej4678dfdkig3d", "357dt547546hgdvc842a50bf"), var1 = 2:7, var4 = 3:7), row.names = c(
-6L), class = "data.frame")