我有一个data.table,它有一列,我们称它为country
,其中包含重复的值,另一列(survey
) for each 观察值都有唯一的字符串.我想创建一个新变量,该变量包含单个国家/地区的所有survey
个字符串的粘贴值.我已经使用INDEX和FOR循环做到了这一点,但我很好奇是否有更快的方法来使用一些data.table语法来完成这项工作.MWe:
library(data.table)
dt <- data.table(country = c("Belgium","Belgium","Bolivia","Brazil","Brazil","Brazil"),
survey = c("BE01, BE03, BE04","BE05, BE07, BE11",
"BO11, BO13", "BR01, BR02, BR03", "BR05","BR12, BR13"))
dt[, index := seq(1, .N), by = "country"]
for(cntry in unique(dt$country)){
tmp_ind <- max(dt[country == eval(cntry)]$index)
dt[country == eval(cntry) & index ==1, all_surveys := survey]
if(tmp_ind > 1) {
for(i in 2:tmp_ind){
dt[country == eval(cntry) & index ==i,
all_surveys := paste0(dt[country == eval(cntry) & index == i-1, all_surveys], survey)]
}
}
}
这就给了我们想要的
> dt
country survey index all_surveys
1: Belgium BE01, BE03, BE04 1 BE01, BE03, BE04
2: Belgium BE05, BE07, BE11 2 BE01, BE03, BE04BE05, BE07, BE11
3: Bolivia BO11, BO13 1 BO11, BO13
4: Brazil BR01, BR02, BR03 1 BR01, BR02, BR03
5: Brazil BR05 2 BR01, BR02, BR03BR05
6: Brazil BR12, BR13 3 BR01, BR02, BR03BR05BR12, BR13