也许我们可以先按‘ID’分组,然后创建列,然后按‘Period’和summarise
分组
library(dplyr)
testDF %>%
group_by(ID) %>%
mutate(`1stStateX` = row_number() == which(State == "X")[1]) %>%
group_by(Period) %>%
summarise(`1stStateX` = sum(`1stStateX`, na.rm = TRUE), .groups = 'drop')
-输出
# A tibble: 5 × 2
Period `1stStateX`
<int> <int>
1 1 0
2 2 1
3 3 1
4 4 0
5 5 0
另一个选项是按ID分组后为slice
,得到count
并使用complete
填充不可用的‘Period’
library(tidyr)
testDF %>%
group_by(ID) %>%
slice(match('X', State)) %>%
ungroup %>%
count(Period, sort = TRUE ,name = "1stStateX") %>%
complete(Period = unique(testDF$Period),
fill = list(`1stStateX` = 0))
-输出
# A tibble: 5 × 2
Period `1stStateX`
<int> <int>
1 1 0
2 2 1
3 3 1
4 4 0
5 5 0
或data.table
中的类似选项
library(data.table)
setDT(testDF)[, `1stStateX` := .I == .I[State == 'X'][1],
ID][, .(`1stStateX` = sum(`1stStateX`, na.rm = TRUE)), by = Period]
-输出
Period 1stStateX
<int> <int>
1: 1 0
2: 2 1
3: 3 1
4: 4 0
5: 5 0