已给予:

library(tidyverse)
library(data.table)
set.seed(1)
df <- data.frame(id = rep(letters[1:3], each = 3),
                 result = c("negative", "positive", "positive", "negative", "negative", "negative",
                            "positive", "negative", "positive"),
                 test_date = seq.Date(from = as.Date("01/01/1998", "%d/%m/%Y"), 
                                 to = as.Date("09/01/1998", "%d/%m/%Y"), by = "day"),
                 type = c("car", "truck", "bike", "wheel", "tyre", "lorry", "car", "bike", "wheel"),
                 colour = c("gre", "blu", "re", "dblu", "yel", "re", "ora", "ti", "bla"),
                 x1 = sample(letters, 9),
                 x2 = sample(letters, 9))
df
#   id   result  test_date  type colour x1 x2
# 1  a negative 1998-01-01   car    gre  y  u
# 2  a positive 1998-01-02 truck    blu  d  z
# 3  a positive 1998-01-03  bike     re  g  j
# 4  b negative 1998-01-04 wheel   dblu  a  v
# 5  b negative 1998-01-05  tyre    yel  b  n
# 6  b negative 1998-01-06 lorry     re  k  x
# 7  c positive 1998-01-07   car    ora  n  g
# 8  c negative 1998-01-08  bike     ti  r  i
# 9  c positive 1998-01-09 wheel    bla  w  o

我想按组提取某些列的第一行和最后一行,以及与第一次出现的result == "positive"相关的test_date行.使用dplyr时,以下操作将起作用:

out_dplyr <- df %>% 
  group_by(id) %>% 
  summarise(across(c(type, colour), dplyr::first),
            across(c(x1, x2), dplyr::last),
            test_date = if (is.na(dplyr::first(test_date[result == "positive"])))
              dplyr::last(test_date[result == "negative"])
            else dplyr::first(test_date[result == "positive"])) %>% 
  ungroup()
out_dplyr
#   id    type  colour x1    x2    test_date 
#   <chr> <chr> <chr>  <chr> <chr> <date>    
# 1 a     car   gre    g     j     1998-01-02
# 2 b     wheel dblu   k     x     1998-01-06
# 3 c     car   ora    w     o     1998-01-07

我一直在try 对data.table执行同样的操作,如果省略第三个变量,下面的方法也会起作用:

df_dt <- data.table(df)
dt_names_first <- c("type", "colour")
dt_names_last <- c("x1", "x2")
out_dt <- df_dt[, c(lapply(.SD[, dt_names_first, with = FALSE], data.table::first),
                    lapply(.SD[, dt_names_last, with = FALSE], data.table::last)), .(id)]
out_dt
#    id  type colour x1 x2
# 1:  a   car    gre  g  j
# 2:  b wheel   dblu  k  x
# 3:  c   car    ora  w  o

但当我包括它时,它失败了,因为我认为我没有正确地包括它:

out_dt <- df_dt[, c(lapply(.SD[, dt_names_first, with = FALSE], data.table::first),
                    lapply(.SD[, dt_names_last, with = FALSE], data.table::last),
                    test_date = if (is.na(data.table::first(test_date[result == "positive"])))
                      data.table::last(test_date[result == "negative"])
                    else data.table::first(test_date[result == "positive"])), .(id)]
out_dt

有什么建议吗?

谢谢

推荐答案

我们可以使用all/any来包装逻辑向量,因为if/else没有被矢量化,即它需要一个单独的TRUE/FALSE,此外,将test_date = ..包装在list

df_dt[, c(lapply(.SD[, dt_names_first, with = FALSE], data.table::first),
              lapply(.SD[, dt_names_last, with = FALSE], data.table::last),
            list(test_date = if(all(is.na(data.table::first(test_date[
        result == "positive"]))))
        data.table::last(test_date[result == "negative"])
                    else data.table::first(test_date[result == "positive"]))),
 .(id)]

-输出

   id  type colour x1 x2  test_date
1:  a   car    gre  g  j 1998-01-02
2:  b wheel   dblu  k  x 1998-01-06
3:  c   car    ora  w  o 1998-01-07

R相关问答推荐

从API中抓取R数据SON

在位置周围设定一个半径并识别该半径内的其他位置

编辑文件后编辑RhandsonTable

如何删除gggvenn与gggplot绘制的空白?

derrr summarise每个组返回多行?

从R导出全局环境中的所有sf(numrames)对象

在R中,如何将变量(A,B和C)拟合在同一列中,如A和B,以及A和C在同一面板中?

如何计算多个日期是否在一个日期范围内

跨列查找多个时间报告

在R中,我如何使用滑动窗口计算位置,然后进行过滤?

在列表中排列R数据框中的列顺序

如何在R中改变fviz_pca_biplot中圆的边界线的 colored颜色 ?

在具有多个响应变量的比例堆叠条形图上方添加总计

将多个列合并为一个列的有效方法是什么?

判断函数未加载R中的库

Ggplot2如何找到存储在对象中的残差和拟合值?

长/纬点继续在堪萨斯-SF结束,整齐的人口普查

使用ggplot2绘制具有边缘分布的坡度图

将CSV转换为R中的自定义JSON格式

对计算变量所有唯一值的变量进行变异