假设我们从下面的data
个数据帧开始,由下面的代码生成:
> data
ID Period_1 Period_2 Values State
1 1 1 2020-01 5 X0
2 1 2 2020-02 10 X1
3 1 3 2020-03 15 X0
4 2 1 2020-04 0 X0
5 2 2 2020-05 2 X2
6 2 3 2020-06 4 X0
7 3 1 2020-02 3 X2
8 3 2 2020-03 6 X1
9 3 3 2020-04 9 X0
data <-
data.frame(
ID = c(1,1,1,2,2,2,3,3,3),
Period_1 = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
Period_2 = c("2020-01","2020-02","2020-03","2020-04","2020-05","2020-06","2020-02","2020-03","2020-04"),
Values = c(5, 10, 15, 0, 2, 4, 3, 6, 9),
State = c("X0","X1","X0","X0","X2","X0", "X2","X1","X0")
)
我正在努力学习如何使用R软件包数据.表,并希望使用它来计算从一个给定状态(下面代码示例中的状态"X0")到另一个状态的转换,当从一个周期移动或"转换"到下一个周期时(在这种情况下,周期测量是"周期_1").我在运行数据时得到以下结果.下表代码:
OutflowState 2 4
1: X0 0 0
2: X1 1 0
3: X2 1 0
Code run:
library(data.table)
dcast(
setDT(data)[, OutflowState := factor(shift(State, type = c("lead"))), by = ID]
[, period_factor := lapply(.SD, factor), .SDcols = "Period_1"]
[, period_factor := as.numeric(period_factor) + 1],
OutflowState ~ period_factor, length,
value.var = "Values", subset = .(State == "X0"), drop = FALSE
)
这个输出是正确的,但我想(a)在周期1和3的输出中添加列(周期1始终都是0,对于这个data
数据帧,周期3应该显示所有0,因为在周期2中没有状态=X0;以及(b)从输出中删除周期_1=4的列,因为没有周期=4,这只是as.numeric(period_factor) + 1
以上代码中用来标记下一个过渡期的一个技巧.我怎么能这么做?
当运行下面显示的代码段时,我会得到以下临时数据帧,所以一个解决方案是删除OutflowState=NA(消除所有概念周期4)的任何行,但我不知道如何做到这一点.
ID Period_1 Period_2 Values State OutflowState period_factor
1: 1 1 2020-01 5 X0 X1 2
2: 1 2 2020-02 10 X1 X0 3
3: 1 3 2020-03 15 X0 <NA> 4
4: 2 1 2020-04 0 X0 X2 2
5: 2 2 2020-05 2 X2 X0 3
6: 2 3 2020-06 4 X0 <NA> 4
7: 3 1 2020-02 3 X2 X1 2
8: 3 2 2020-03 6 X1 X0 3
9: 3 3 2020-04 9 X0 <NA> 4
setDT(data)[, OutflowState := factor(shift(State, type = c("lead"))), by = ID][
, period_factor := lapply(.SD, factor), .SDcols = "Period_1"][
, period_factor := as.numeric(period_factor) + 1
]
data
这个问题是How to use data.table to build a new dataframe showing inflows into a specified transition state based on the value of an element in a prior row?个解决过渡资金流入问题的结果.注意上面的数据.表代码允许将时间范围定义为周期_2,并对值的转换求和,而不是计算转换,并且需要维护这些功能.
下图更好地说明了: