R 在一个 data.table 调用中创建多个变量

发布于03月24日

已给予:

library(tidyverse)
library(data.table)
set.seed(1)
df <- data.frame(id = rep(letters[1:3], each = 3),
                 result = c("negative", "positive", "positive", "negative", "negative", "negative",
                            "positive", "negative", "positive"),
                 test_date = seq.Date(from = as.Date("01/01/1998", "%d/%m/%Y"), 
                                 to = as.Date("09/01/1998", "%d/%m/%Y"), by = "day"),
                 type = c("car", "truck", "bike", "wheel", "tyre", "lorry", "car", "bike", "wheel"),
                 colour = c("gre", "blu", "re", "dblu", "yel", "re", "ora", "ti", "bla"),
                 x1 = sample(letters, 9),
                 x2 = sample(letters, 9))
df
#   id   result  test_date  type colour x1 x2
# 1  a negative 1998-01-01   car    gre  y  u
# 2  a positive 1998-01-02 truck    blu  d  z
# 3  a positive 1998-01-03  bike     re  g  j
# 4  b negative 1998-01-04 wheel   dblu  a  v
# 5  b negative 1998-01-05  tyre    yel  b  n
# 6  b negative 1998-01-06 lorry     re  k  x
# 7  c positive 1998-01-07   car    ora  n  g
# 8  c negative 1998-01-08  bike     ti  r  i
# 9  c positive 1998-01-09 wheel    bla  w  o

我想按组提取某些列的第一行和最后一行，以及与第一次出现的result == "positive"相关的test_date行.使用dplyr时，以下操作将起作用:

out_dplyr <- df %>% 
  group_by(id) %>% 
  summarise(across(c(type, colour), dplyr::first),
            across(c(x1, x2), dplyr::last),
            test_date = if (is.na(dplyr::first(test_date[result == "positive"])))
              dplyr::last(test_date[result == "negative"])
            else dplyr::first(test_date[result == "positive"])) %>% 
  ungroup()
out_dplyr
#   id    type  colour x1    x2    test_date 
#   <chr> <chr> <chr>  <chr> <chr> <date>    
# 1 a     car   gre    g     j     1998-01-02
# 2 b     wheel dblu   k     x     1998-01-06
# 3 c     car   ora    w     o     1998-01-07

我一直在try 对data.table执行同样的操作，如果省略第三个变量，下面的方法也会起作用:

df_dt <- data.table(df)
dt_names_first <- c("type", "colour")
dt_names_last <- c("x1", "x2")
out_dt <- df_dt[, c(lapply(.SD[, dt_names_first, with = FALSE], data.table::first),
                    lapply(.SD[, dt_names_last, with = FALSE], data.table::last)), .(id)]
out_dt
#    id  type colour x1 x2
# 1:  a   car    gre  g  j
# 2:  b wheel   dblu  k  x
# 3:  c   car    ora  w  o

但当我包括它时，它失败了，因为我认为我没有正确地包括它:

out_dt <- df_dt[, c(lapply(.SD[, dt_names_first, with = FALSE], data.table::first),
                    lapply(.SD[, dt_names_last, with = FALSE], data.table::last),
                    test_date = if (is.na(data.table::first(test_date[result == "positive"])))
                      data.table::last(test_date[result == "negative"])
                    else data.table::first(test_date[result == "positive"])), .(id)]
out_dt

有什么建议吗？

谢谢

df_dt[, c(lapply(.SD[, dt_names_first, with = FALSE], data.table::first), lapply(.SD[, dt_names_last, with = FALSE], data.table::last), list(test_date = if(all(is.na(data.table::first(test_date[ result == "positive"])))) data.table::last(test_date[result == "negative"]) else data.table::first(test_date[result == "positive"]))), .(id)]

R 在一个 data.table 调用中创建多个变量

推荐答案

R相关问答推荐

从API中抓取R数据SON

在位置周围设定一个半径并识别该半径内的其他位置

编辑文件后编辑RhandsonTable

如何删除gggvenn与gggplot绘制的空白？

derrr summarise每个组返回多行？

从R导出全局环境中的所有sf(numrames)对象

在R中，如何将变量(A，B和C)拟合在同一列中，如A和B，以及A和C在同一面板中？

如何计算多个日期是否在一个日期范围内

跨列查找多个时间报告

在R中，我如何使用滑动窗口计算位置，然后进行过滤？

在列表中排列R数据框中的列顺序

如何在R中改变fviz_pca_biplot中圆的边界线的 colored颜色？

在具有多个响应变量的比例堆叠条形图上方添加总计

将多个列合并为一个列的有效方法是什么？

判断函数未加载R中的库

Ggplot2如何找到存储在对象中的残差和拟合值？

长/纬点继续在堪萨斯-SF结束，整齐的人口普查

使用ggplot2绘制具有边缘分布的坡度图

将CSV转换为R中的自定义JSON格式

对计算变量所有唯一值的变量进行变异