样本数据:
data <- data.frame(
year = c(2018, 2018, 2018, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020),
patient_id = c(101, 102, 103, 104, 102, 102, 106, 105, 105, 107, 108, 109, 105, 105, 201, 202, 203, 204, 205, 205, 205, 209, 208),
discharge_date = as.Date(c("1/1/2018", "1/5/2018", "1/8/2018", "2/5/2018", "2/10/2018", "2/11/2018", "3/1/2018", "1/2/2019", "1/10/2019", "3/1/2019", "3/5/2019", "3/25/2019", "5/5/2019", "5/6/2019", "1/1/2020", "2/1/2020", "2/10/2020", "3/3/2020", "4/1/2020", "4/2/2020", "4/3/2020", "6/17/2020", "8/8/2020"), format = "%m/%d/%Y"),
contagious_admission = c(0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0)
)
我正在try 创建列latest_discharge_date
,它按照以下逻辑接受discharge_date
值:
如果当前行的patient_id
与前面的patient_id
不同,则从当前行中接收discharge_date
.
如果当前行的patient_id与前面的patient_id
相同,并且在两行contagious_admission == 0
中都是AND,那么它从当前行中接收discharge_date
.
但是,当存在具有相同patient_id
的领先单行或多行连续时,该系列的第一行contagious_admission == 0
,在后面的行contagious_admission == 1
中,第一行的latest_discharge_date
应该从连续行contagious_admission == 1
中接收最新或最大的discharge_date
.
第一次try :
data |>
mutate(
latest_discharge_date = case_when(
contagious_admission == 0 & lead(contagious_admission) == 0 ~ discharge_date,
contagious_admission == 0 & lead(contagious_admission) == 1 ~ lead(discharge_date)
, TRUE ~ NA)
)
一切都运行良好,但如果您查看patient_id = 205
,则索引行(patient_id == 205 & contagious_admission == 0
)的latest_discharge_date
正在吸收"2020-04-02"
.但我需要它来获取下一个前置日期(属于同一个patient_id
和contagious_admission == 1
"组"),即"2020-04-03"
.
第二次try :
data |>
mutate(
latest_discharge_date = case_when(
contagious_admission == 0 & lead(contagious_admission) == 0 ~ discharge_date,
contagious_admission == 0 & lead(contagious_admission) == 1 ~ lead(pmax(lead(discharge_date)))
, TRUE ~ NA)
)
这一个钉了具有多行contagious_admission == 1
行的行的行,但超过了单个行.