我有一个数据帧data
vp | v1 | v2 | v3 | v4
0 | a | b | c | ...
0 | d | e | f | ...
0 | g | h | i | ...
1 | a | b | c | ...
1 | d | e | f | ...
1 | g | h | i | ...
...
我想验证一下,对于每个vp,我都有var1、var2和var3的必要组合.
为此,我创建了一个数据帧prototype,其中包含必要的组合
var1 | var2 | var3
a | b | c
d | e | f
g | h | i
并try 判断它是否与每个VP组的相应部分相同
data %>% group_by(vp) %>% summarise(identical = identical(. %>%
# as.data.frame() %>%
select(var1, var2, var3) %>%
arrange(var1, var2, var3),
prototype %>% arrange(var1, var2, var3))
So I expected, that I could select the area of the data.frame per group (maybe transform it into a new data.frame, if the data format would be relevant) and than check, that it is identical.
However the result is always FALSE
如果我不使用group_by
和summarise
,而是使用筛选器(VP==...)并按组手动执行,它将按预期工作.
为什么我说错了?如何实现我的try (最好是dplyr/tidyr风格)?
MVC:
library(dplyr)
data <- data.frame(vp = c(rep(0,3), rep(1,3), rep(2,3)),
v1 = rep(c("a", "d", "g"), 3),
v2 = rep(c("b", "e", "h"), 3),
v3 = c(rep(c("c", "f", "i"), 2), c("c", "f", "x")))
prototype <- data.frame(v1 = c("a", "d", "g"),
v2 = c("b", "e", "h"),
v3 = c("c", "f", "i"))
expected_result <- data.frame(vp=c(1,2,3), identical=c(TRUE, TRUE, FALSE))
data %>% group_by(vp) %>% summarise(identical = identical(. %>%
select(v1, v2, v3) %>%
arrange(v1, v2, v3),
prototype %>%
arrange(v1, v2, v3)))