我有这个格式的 Big Data 集.我想 a)在x1和x10之间的任何地方用以下值序列1—1—1—1标识那些ID/行;以及>> b)生成标识序列开始的新变量("事件"),取值X1,...,X10
my_df <- data.frame(ID = c("a","b","c","d","e","f","g","h"),
replicate(8,sample(1:4,8,rep=TRUE)))
对于a),我用2替换值1,然后粘贴值从X1到X10,然后过滤序列1—1—2—2.>对于b),我使用嵌套的ifelse()创建了变量"event",以标识序列的开始位置.它只适用于8列. 是否有一种方法可以提高具有更多列的数据集的效率?
我非常感谢任何指示!
df_seq <- my_df%>%
mutate_at(vars(starts_with('X')), funs(ifelse(. > 1, 2, .)))%>%
mutate(seq = paste(X1,"-",X2,"-",X3,"-",X4,"-",X5,"-",X6,"-",X7,"-",X8))%>%
filter(grepl("1 - 1 - 2 - 2", seq))%>%
mutate(event = ifelse(X1 == 1 & X2 == 1 & X3 == 2 & X4 == 2,"X1",
ifelse(X2 == 1 & X3 == 1 & X4 == 2 & X5 == 2,"X2",
ifelse(X3 == 1 & X4 == 1 & X5 == 2 & X6 == 2,"X3",
ifelse(X4 == 1 & X5 == 1 & X6 == 2 & X7 == 2,"X4","X5")))))