我有一个数据集,可以随着时间的推移垂直存储参与者的实例.他们基本上可以有任何数量的后续行动,参与者的数量从1到14行不等,但随着时间的推移,预计会增加更多.
我有一个变量列表var
,参与者大概已经在每个后续行动中报告了这些变量,并且想要创建一组新的"曾经"变量vare
,其描述在该后续行动之前的任何时间,参与者对相应的变量报告"是"的情况.
以下是所需输入/输出的示例:
var = c("var1","var2")
vare = paste0(var,"_ever")
data = data.frame(idno = c(123,123,123,123,123,123,123)
followup_num = c(0,1,2,3,4,5,6)
var1 = c(0,NA,0,1,0,NA,1)
var2 = c(1,NA,NA,0,0,0,1)
)
data$var1_ever = c(0,0,0,1,1,1,1)
data$var2_ever = c(1,1,1,1,1,1,1)
idno | followup_num | var1 | var1_ever | var2 | var2_ever |
---|---|---|---|---|---|
123 | 0 | 0 | 0 | 1 | 1 |
123 | 1 | NA | 0 | NA | 1 |
123 | 2 | 0 | 0 | NA | 1 |
123 | 3 | 1 | 1 | 0 | 1 |
123 | 4 | 0 | 1 | 0 | 1 |
123 | 5 | NA | 1 | 0 | 1 |
123 | 6 | 1 | 1 | 1 | 1 |
这是我目前使用的代码.显然,嵌套的for循环在R中并不理想,这段代码在处理几千行代码时特别慢.
#For each ID
for (i in unique(data$idno)) {
id = data$idno%in%i #Get the relevant lines for this ID
fus = sort(data$followup_num[id]) #Get the follow-up numbers
#For each variable in the list
for (v in seq_along(var)) {
#Loop through the follow-ups. If you see that the variable reports "yes", mark
# this and every proceeding follow-up as having reported that variable ever
# Otherwise, mark the opposite at that line and move to the next follow-up
for (f in fus) {
if (t(data[id & data$followup_num%in%f,var[v]])%in%1) {
data[id & data$followup_num >= f,vare[v]] = 1
break
} else {
data[id & data$followup_num%in%f,vare[v]] = 0
}
}
}
}
这是现有解决方案的问题吗?有没有优化/简化的方法?有没有我没有try 过的应用/sApply/等函数的用法?