关键字在R中的字符串的上下文中重复多次

发布于05月06日

我有一个数据集(z)，其中的字符串在z$txt中非常长.我还有一本需要识别的关键词词典(incd).在第z$inc.terms栏.我需要所有的关键字(同一个关键字可能在同一个字符串中重复n次，所以每次出现都需要这个)，前后都有5个字符(例如，我可以在上下文中看到"关键字").

#CREATE "z" DATASET
z<-data.frame(matrix("",3,3))
names(z)<-c("row","txt","inc.terms")
z$row<-c(1,2,3)
z[1,2]<-"I like the sky when the sky is blu not when the sky is grey"
z[2,2]<-"I like the mountains when the sky is blu not when the mountains are cloudy"
z[3,2]<-"I like the sky when the sky is dark in the mountains"

incd<-c("sky","mountains")                       #inclusion dictionary

这是我设法实现的，但它只返回第一个关键字，我需要每个关键字(实际上，这也不起作用，不知道为什么，但它在我的原始数据中起作用，它更复杂，无法共享以保护数据).

for(i in incd){
   for(j in z$row){
     z$inc.terms[z$row==j]<-paste(z$inc.term[z$row==j],paste(stringr::str_sub(stringr::str_split(z$txt[z$row==j],i,simplify=TRUE)[,1],-5,-1),i,stringr::str_sub(stringr::str_split(z$txt[z$row==j],i,simplify=TRUE)[,2],1,5)),sep=" /// ")
 }
}

这是我一直在使用的，但它返回每个单元格中每个关键字的第一次出现，而不是每个关键字.

我希望z$inc.terms分的结果如下:

z[1,3]  " the sky when" /// " the sky is b" /// " the sky is g"
z[2,3]  " the mountains when" /// " the sky is b" /// " the mountains are "
z[3,3]  " the sky when" /// " the sky is d" /// " the mountains"

row 1 1 2 2 3 3 txt 1 I like the sky when the sky is blu not when the sky is grey 2 I like the mountains when the sky is blu not when the mountains are cloudy 3 I like the sky when the sky is dark in the mountains inc.terms 1 the sky when, the sky is b, the sky is g 2 the mountains when, the sky is b, the mountains are 3 the sky when, the sky is d, the mountains

关键字在R中的字符串的上下文中重复多次

推荐答案

R相关问答推荐

从多个前置日期中获取最长日期

如何删除多个.CSV文件的行

如何使用R Shiny中的条件面板仅隐藏和显示用户输入，同时仍允许运行基础计算？

derrr summarise每个组返回多行？

如何编辑ggplot的图例字使用自定义对象(gtable)？'

R s iml包如何处理语法上无效的因子级别？'

根据日期从参考帧中创建不同的帧

无法定义沿边轨迹的 colored颜色渐变(与值无关)

条形图顶部与其错误条形图不对齐

R如何计算现有行的总和以添加新的数据行

从多层嵌套列表构建Tibble？

当每个变量值只能 Select 一次时，如何从数据框中 Select 两个变量的组合？

如何阻止围堵地理密度图？

有没有办法将不等长的列表转换为R中的数据帧

在ggplot2图表中通过端点连接点

有没有办法更改ggplot2中第二个y轴的比例限制？

打印的.txt文件，将值显示为&Quot；Num&Quot；而不是值

如何修复geom_rect中的层错误？

向数据添加标签

根据小时-分钟列创建年-月-日序列