我的输入字符串是数字序列,后跟两个字母中的一个(以任何顺序,也可能是一行中的几个这样的序列):
s <- c("2w", "1p", "3w1p3w", "2p12w2p3w")
以下是我需要的模式(我认为):
pattern <- ([0-9]+w)*([0-9]+p)*
然而,我无法获得所需的输出:
list("2w", "1p", c("3w","1p","3w"), c("2p","12w","2p","3w"))
我试过这个:
(out <- regmatches(s, gregexec(pattern, s)))
但我不理解输出,也不知道如何重新格式化才能得到我想要的:
[[1]]
[,1]
[1,] "2w"
[2,] "2w"
[3,] ""
[[2]]
[,1]
[1,] "1p"
[2,] ""
[3,] "1p"
[[3]]
[,1] [,2]
[1,] "3w1p" "3w"
[2,] "3w" "3w"
[3,] "1p" ""
[[4]]
[,1] [,2] [,3]
[1,] "2p" "12w2p" "3w"
[2,] "" "12w" "3w"
[3,] "2p" "2p" ""
最后,我想将每个字母的所有计数相加,得到如下结果:
data.frame(s=s, w=c(2,0,6,15), p=c(0,1,1,4))
s w p
1 2w 2 0
2 1p 0 1
3 3w1p3w 6 1
4 2p12w2p3w 15 4