已给予:
library(tidyverse)
library(data.table)
set.seed(1)
df <- data.frame(id = rep(letters[1:3], each = 3),
result = c("negative", "positive", "positive", "negative", "negative", "negative",
"positive", "negative", "positive"),
test_date = seq.Date(from = as.Date("01/01/1998", "%d/%m/%Y"),
to = as.Date("09/01/1998", "%d/%m/%Y"), by = "day"),
type = c("car", "truck", "bike", "wheel", "tyre", "lorry", "car", "bike", "wheel"),
colour = c("gre", "blu", "re", "dblu", "yel", "re", "ora", "ti", "bla"),
x1 = sample(letters, 9),
x2 = sample(letters, 9))
df
# id result test_date type colour x1 x2
# 1 a negative 1998-01-01 car gre y u
# 2 a positive 1998-01-02 truck blu d z
# 3 a positive 1998-01-03 bike re g j
# 4 b negative 1998-01-04 wheel dblu a v
# 5 b negative 1998-01-05 tyre yel b n
# 6 b negative 1998-01-06 lorry re k x
# 7 c positive 1998-01-07 car ora n g
# 8 c negative 1998-01-08 bike ti r i
# 9 c positive 1998-01-09 wheel bla w o
我想按组提取某些列的第一行和最后一行,以及与第一次出现的result == "positive"
相关的test_date
行.使用dplyr
时,以下操作将起作用:
out_dplyr <- df %>%
group_by(id) %>%
summarise(across(c(type, colour), dplyr::first),
across(c(x1, x2), dplyr::last),
test_date = if (is.na(dplyr::first(test_date[result == "positive"])))
dplyr::last(test_date[result == "negative"])
else dplyr::first(test_date[result == "positive"])) %>%
ungroup()
out_dplyr
# id type colour x1 x2 test_date
# <chr> <chr> <chr> <chr> <chr> <date>
# 1 a car gre g j 1998-01-02
# 2 b wheel dblu k x 1998-01-06
# 3 c car ora w o 1998-01-07
我一直在try 对data.table
执行同样的操作,如果省略第三个变量,下面的方法也会起作用:
df_dt <- data.table(df)
dt_names_first <- c("type", "colour")
dt_names_last <- c("x1", "x2")
out_dt <- df_dt[, c(lapply(.SD[, dt_names_first, with = FALSE], data.table::first),
lapply(.SD[, dt_names_last, with = FALSE], data.table::last)), .(id)]
out_dt
# id type colour x1 x2
# 1: a car gre g j
# 2: b wheel dblu k x
# 3: c car ora w o
但当我包括它时,它失败了,因为我认为我没有正确地包括它:
out_dt <- df_dt[, c(lapply(.SD[, dt_names_first, with = FALSE], data.table::first),
lapply(.SD[, dt_names_last, with = FALSE], data.table::last),
test_date = if (is.na(data.table::first(test_date[result == "positive"])))
data.table::last(test_date[result == "negative"])
else data.table::first(test_date[result == "positive"])), .(id)]
out_dt
有什么建议吗?
谢谢