我当时正在开发一个语言解析器,我想要计算一个更大的字符串中的某些字符串元素(比如"</i>"
个).由于字符串已被清理(str.trim
),因此它之后没有任何内容.我在strsplit
上遇到了一些奇怪的行为,因为如果分隔符sep
(在RTM中称为split
)位于字符串的开头或结尾,它的行为似乎会有所不同.
下面是一个示例:
str1 = "<i>hello friend</i>";
str2 = paste0(" ",str1);
str3 = paste0(str1, " ");
sep1="<i>";
sep2="</i>";
str = c(str1, str2, str3); n = length(str);
sep = c(sep1, sep2); ns = length(sep);
base = matrix("", nrow=n, ncol=ns);
rownames(base) = str; colnames(base) = sep;
for(i in 1:n)
{
for(j in 1:ns)
{
base[i, j] = paste0(base::strsplit(str[i], sep[j], fixed=TRUE)[[1]], collapse="|");
}
}
base;
stringi = matrix("", nrow=n, ncol=ns);
rownames(stringi) = str; colnames(stringi) = sep;
for(i in 1:n)
{
for(j in 1:ns)
{
stringi[i, j] = paste0(stringi::stri_split_fixed(str[i], sep[j])[[1]], collapse="|");
}
}
stringi;
stopifnot(identical(base,stringi));
base的输出:
> base;
<i> </i>
<i>hello friend</i> "|hello friend</i>" "<i>hello friend"
<i>hello friend</i> " |hello friend</i>" " <i>hello friend"
<i>hello friend</i> "|hello friend</i> " "<i>hello friend| "
stringi的输出:
> stringi;
<i> </i>
<i>hello friend</i> "|hello friend</i>" "<i>hello friend|"
<i>hello friend</i> " |hello friend</i>" " <i>hello friend|"
<i>hello friend</i> "|hello friend</i> " "<i>hello friend| "
core的差值是ROW=1,COL=2...
Question: What is E[strsplit]
?
base是一种功能,stringi是一种错误吗?还是反之亦然?
EOS(字符串结尾)拆分的行为不应该与BOS(字符串开头)拆分相同吗?
> R.version
_
platform x86_64-w64-mingw32
arch x86_64
os mingw32
crt ucrt
system x86_64, mingw32
status
major 4
minor 2.1
year 2022
month 06
day 23
svn rev 82513
language R
version.string R version 4.2.1 (2022-06-23 ucrt)
nickname Funny-Looking Kid
和
> packageVersion("stringi")
[1] ‘1.7.8’
>