我有两个数据集
d1=structure(list(et = c("s", "s"), gg = c("d", "d"), hj = c("f",
"f"), ggh = c("h", "h"), wer = c(23L, 45L)), class = "data.frame", row.names = c(NA,
-2L))
和
d2=structure(list(et = c("s", "s"), gg = c("d", "d"), hj = c("f",
"f"), ggh = c("h", "f"), wer = c(3L, 7L)), class = "data.frame", row.names = c(NA,
-2L))
我根据以下原则进行值的更改:如果在数据集d2
中,相同类别的值wer
为d1
,小于或大于1上该类别的中位数d1
,则在d2中,将中位数的值放入该类别.
为了更清楚地了解我想要什么,从d1
开始
et gg hj ggh (this categorical vars)
s d f h
wer的中值=34
d2具有相同的s d f h
类,其中wer=3,所以3<;所以我必须改变34的这个值,
现在我使用代码
library(dplyr)
d1 %>%
group_by(across(-wer)) %>%
summarise(wer = median(wer), .groups = "drop") %>%
right_join(d2, by = c("et", "gg", "hj", "ggh"), suffix = c("", ".y")) %>%
mutate(wer = ifelse(wer >= wer.y, wer, wer.y), .keep = "unused")
它做了我需要的,但是,对于d1中的未知类别,它将NA
et gg hj ggh wer
<chr> <chr> <chr> <chr> <dbl>
1 s d f h 34
2 s d f f NA
但必须是d2中此类产品的真实价值
et gg hj ggh wer
<chr> <chr> <chr> <chr> <dbl>
1 s d f h 34
2 s d f f 7
我该怎么修?