使用 R，strsplit 如何在字符串末尾使用拆分器处理固定元素进行拆分

发布于09月15日

我当时正在开发一个语言解析器，我想要计算一个更大的字符串中的某些字符串元素(比如"</i>"个).由于字符串已被清理(str.trim)，因此它之后没有任何内容.我在strsplit上遇到了一些奇怪的行为，因为如果分隔符sep(在RTM中称为split)位于字符串的开头或结尾，它的行为似乎会有所不同.

下面是一个示例:

str1 = "<i>hello friend</i>"; 
str2 = paste0(" ",str1);
str3 = paste0(str1, " ");

sep1="<i>";
sep2="</i>";

str = c(str1, str2, str3);  n = length(str);
sep = c(sep1, sep2);        ns = length(sep);

base = matrix("", nrow=n, ncol=ns);
rownames(base) = str; colnames(base) = sep;
for(i in 1:n)
    {
    for(j in 1:ns)
        {
        base[i, j] = paste0(base::strsplit(str[i], sep[j], fixed=TRUE)[[1]], collapse="|");
        }   
    }
base;
    
stringi = matrix("", nrow=n, ncol=ns);
rownames(stringi) = str; colnames(stringi) = sep;
for(i in 1:n)
    {
    for(j in 1:ns)
        {
        stringi[i, j] = paste0(stringi::stri_split_fixed(str[i], sep[j])[[1]], collapse="|");
        }   
    }
stringi;

stopifnot(identical(base,stringi));

base的输出:

> base;
                     <i>                  </i>               
<i>hello friend</i>  "|hello friend</i>"  "<i>hello friend"  
 <i>hello friend</i> " |hello friend</i>" " <i>hello friend" 
<i>hello friend</i>  "|hello friend</i> " "<i>hello friend| "

stringi的输出:

> stringi;
                     <i>                  </i>               
<i>hello friend</i>  "|hello friend</i>"  "<i>hello friend|" 
 <i>hello friend</i> " |hello friend</i>" " <i>hello friend|"
<i>hello friend</i>  "|hello friend</i> " "<i>hello friend| "

core的差值是ROW=1，COL=2...

Question: What is `E[strsplit]`?

base是一种功能，stringi是一种错误吗？还是反之亦然？

EOS(字符串结尾)拆分的行为不应该与BOS(字符串开头)拆分相同吗？

> R.version
               _                                
platform       x86_64-w64-mingw32               
arch           x86_64                           
os             mingw32                          
crt            ucrt                             
system         x86_64, mingw32                  
status                                          
major          4                                
minor          2.1                              
year           2022                             
month          06                               
day            23                               
svn rev        82513                            
language       R                                
version.string R version 4.2.1 (2022-06-23 ucrt)
nickname       Funny-Looking Kid

和

> packageVersion("stringi")
[1] ‘1.7.8’
>

使用 R，strsplit 如何在字符串末尾使用拆分器处理固定元素进行拆分

Question: What is `E[strsplit]`?

推荐答案

R相关问答推荐

单击 map 后，将坐标复制到剪贴板

使用R中的Shapetime裁剪格栅文件

在数学中正确显示摄氏度、开氏度或华氏度

修改用R编写的用户定义函数

在for循环中转换rabrame

如何在Chart_Series()中更改轴值的 colored颜色？

将包含卷的底部25%的组拆分为2行

将数字转换为分钟和秒

在ggplot2的框图中绘制所有级别的系数

哪一行和行和 Select 特定行，但是考虑到Nas

如何从容器函数中提取conf并添加到ggplot2中？

列名具有特殊字符时的循环回归

如何使用同比折线图中的个别日

在使用具有Bray-Curtis相似性的pvCluust时计算p值

基于R中的引用将向量值替换为数据框列的值

在shiny 表格中输入的文本在第一次后未更新

图中显示错误 colored颜色的图例geom_sf

对计算变量所有唯一值的变量进行变异

通过不完全重叠的多个柱连接

使用离散标签自定义图例，用于具有连续但已入库的数据的热图

Question: What is E[strsplit]?

推荐答案

R相关问答推荐

Question: What is `E[strsplit]`?