我有一个data.table,它有一列,我们称它为country,其中包含重复的值,另一列(survey) for each 观察值都有唯一的字符串.我想创建一个新变量,该变量包含单个国家/地区的所有survey个字符串的粘贴值.我已经使用INDEX和FOR循环做到了这一点,但我很好奇是否有更快的方法来使用一些data.table语法来完成这项工作.MWe:

library(data.table)

dt <- data.table(country = c("Belgium","Belgium","Bolivia","Brazil","Brazil","Brazil"),
                      survey = c("BE01, BE03, BE04","BE05, BE07, BE11",
                            "BO11, BO13", "BR01, BR02, BR03", "BR05","BR12, BR13"))
dt[, index := seq(1, .N), by = "country"]
for(cntry in unique(dt$country)){
  tmp_ind <- max(dt[country == eval(cntry)]$index)
  dt[country == eval(cntry) & index ==1, all_surveys := survey]
  if(tmp_ind > 1) {
    for(i in 2:tmp_ind){
      dt[country == eval(cntry) & index ==i,
         all_surveys := paste0(dt[country == eval(cntry) & index == i-1, all_surveys], survey)]
    }
  }
}

这就给了我们想要的

> dt
   country           survey index                      all_surveys
1: Belgium BE01, BE03, BE04     1                 BE01, BE03, BE04
2: Belgium BE05, BE07, BE11     2 BE01, BE03, BE04BE05, BE07, BE11
3: Bolivia       BO11, BO13     1                       BO11, BO13
4:  Brazil BR01, BR02, BR03     1                 BR01, BR02, BR03
5:  Brazil             BR05     2             BR01, BR02, BR03BR05
6:  Brazil       BR12, BR13     3   BR01, BR02, BR03BR05BR12, BR13

推荐答案

无需循环即可求解

library(data.table)

# Create the data.table
dt <- data.table(country = c("Belgium", "Belgium", "Bolivia", "Brazil", "Brazil", "Brazil"),
                 survey = c("BE01, BE03, BE04", "BE05, BE07, BE11", "BO11, BO13", 
                            "BR01, BR02, BR03", "BR05", "BR12, BR13"))

# Create index
dt[, index := seq_len(.N), by = "country"]

# Create all_surveys
dt[, all_surveys := Reduce(function(prev, curr) paste0(prev, curr), 
                           x = survey, 
                           accumulate = TRUE), 
   by = "country"]

print(dt)

> print(dt)
   country           survey index                      all_surveys
1: Belgium BE01, BE03, BE04     1                 BE01, BE03, BE04
2: Belgium BE05, BE07, BE11     2 BE01, BE03, BE04BE05, BE07, BE11
3: Bolivia       BO11, BO13     1                       BO11, BO13
4:  Brazil BR01, BR02, BR03     1                 BR01, BR02, BR03
5:  Brazil             BR05     2             BR01, BR02, BR03BR05
6:  Brazil       BR12, BR13     3   BR01, BR02, BR03BR05BR12, BR13

R相关问答推荐

R:随机抽取所有可能排列的样本

从R中的函数中提取变量以及它们来自哪些环境?

R通过字符串中的索引连接数据帧r

在水平条形图中zoom x_轴

如何创建具有总计列和ggplot 2所有条线的百分比标签的堆叠条形图?

在ComplexHeatmap中,如何更改anno_barplot()标题的Angular ?

检测(并替换)字符串中的数学符号

行式dppr中的变量列名

隐藏e_mark_line的工具提示

我不能在docker中加载sf

在某些栏和某些条件下,替换dfs列表中的NA

根据日期从参考帧中创建不同的帧

我如何才能找到FAMILY=POISSON(LINK=&Q;LOG&Q;)中的模型预测指定值的日期?

对于变量的每个值,仅 Select 包含列表中所有值的值.R

如何在科学记数法中显示因子

'使用`purrr::pwalk`从嵌套的嵌套框架中的列表列保存ggplots时出现未使用的参数错误

根据r中另一个文本列中给定的范围对各列求和

`-`是否也用于数据帧,有时使用引用调用?

我需要使用ggplot2制作堆叠条形图

条形图中的条形图没有try 赋予它们的 colored颜色