R 从多个前置日期中获取最长日期

发布于04月21日

样本数据:

data <- data.frame(
  year = c(2018, 2018, 2018, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020),
  patient_id = c(101, 102, 103, 104, 102, 102, 106, 105, 105, 107, 108, 109, 105, 105, 201, 202, 203, 204, 205, 205, 205, 209, 208),
  discharge_date = as.Date(c("1/1/2018", "1/5/2018", "1/8/2018", "2/5/2018", "2/10/2018", "2/11/2018", "3/1/2018", "1/2/2019", "1/10/2019", "3/1/2019", "3/5/2019", "3/25/2019", "5/5/2019", "5/6/2019", "1/1/2020", "2/1/2020", "2/10/2020", "3/3/2020", "4/1/2020", "4/2/2020", "4/3/2020", "6/17/2020", "8/8/2020"), format = "%m/%d/%Y"),
  contagious_admission = c(0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0)
)

我正在try 创建列latest_discharge_date，它按照以下逻辑接受discharge_date值: 如果当前行的patient_id与前面的patient_id不同，则从当前行中接收discharge_date. 如果当前行的patient_id与前面的patient_id相同，并且在两行contagious_admission == 0中都是AND，那么它从当前行中接收discharge_date. 但是，当存在具有相同patient_id的领先单行或多行连续时，该系列的第一行contagious_admission == 0，在后面的行contagious_admission == 1中，第一行的latest_discharge_date应该从连续行contagious_admission == 1中接收最新或最大的discharge_date.

第一次try :

data |>
  mutate(
    latest_discharge_date = case_when(
      contagious_admission == 0 & lead(contagious_admission) == 0 ~ discharge_date,
      contagious_admission == 0 & lead(contagious_admission) == 1 ~ lead(discharge_date)
    , TRUE ~ NA)
  )

一切都运行良好，但如果您查看patient_id = 205，则索引行(patient_id == 205 & contagious_admission == 0)的latest_discharge_date正在吸收"2020-04-02".但我需要它来获取下一个前置日期(属于同一个patient_id和contagious_admission == 1"组")，即"2020-04-03".

第二次try :

data |>
  mutate(
    latest_discharge_date = case_when(
      contagious_admission == 0 & lead(contagious_admission) == 0 ~ discharge_date,
      contagious_admission == 0 & lead(contagious_admission) == 1 ~ lead(pmax(lead(discharge_date)))
    , TRUE ~ NA)
  )

这一个钉了具有多行contagious_admission == 1行的行的行，但超过了单个行.

year patient_id discharge_date contagious_admission latest_discharge_date <dbl> <dbl> <date> <dbl> <date> 1 2018 101 2018-01-01 0 2018-01-01 2 2018 102 2018-01-05 0 2018-01-05 3 2018 103 2018-01-08 0 2018-01-08 4 2018 104 2018-02-05 0 2018-02-05 5 2018 102 2018-02-10 0 2018-02-11 6 2018 102 2018-02-11 1 NA 7 2018 106 2018-03-01 0 2018-03-01 8 2019 105 2019-01-02 0 2019-01-02 9 2019 105 2019-01-10 0 2019-01-10 10 2019 107 2019-03-01 0 2019-03-01 11 2019 108 2019-03-05 0 2019-03-05 12 2019 109 2019-03-25 0 2019-03-25 13 2019 105 2019-05-05 0 2019-05-06 14 2019 105 2019-05-06 1 NA 15 2020 201 2020-01-01 0 2020-01-01 16 2020 202 2020-02-01 0 2020-02-01 17 2020 203 2020-02-10 0 2020-02-10 18 2020 204 2020-03-03 0 2020-03-03 19 2020 205 2020-04-01 0 2020-04-03 20 2020 205 2020-04-02 1 NA 21 2020 205 2020-04-03 1 NA 22 2020 209 2020-06-17 0 2020-06-17 23 2020 208 2020-08-08 0 2020-08-08

R 从多个前置日期中获取最长日期

推荐答案

R相关问答推荐

是否有R函数来判断一个组中的所有值是否与另一个组中的所有值相同？

将一个载体的值相加，直到达到另一个载体的值

使用ggcorrplot在相关性矩阵上标注supertitle和index标签

在ggplot Likert条中添加水平线

无法运行通过R中的Auto.arima获得的ARIMA模型

geom_Ribbon条件填充创建与数据不匹配的形状(ggplot 2 r)

从有序数据中随机抽样

如何将在HW上运行的R中的消息(错误、警告等)作为批处理任务输出

基于不同组的列的相关性

获取列中值更改的行号

为什么观察不会被无功值变化触发？

自动变更列表

以更少间隔的较小表中的聚合离散频率表

在gggraph中显示来自不同数据帧的单个值

计算多变量的加权和

如果满足条件，则替换列的前一个值和后续值

使用dqur在不同变量上创建具有多个条件的变量

使用列名和r中的前缀 Select 列的CREATE函数

如何创建一个由一个连续变量和一个因素变量组成的复杂方框图？

在R中添加要打印的垂直线