使用R GSUB仅返回字符串中的两位数字符

发布于10月21日

我有一个变量，它的值如下:

example <- c("positive_1", "positive_2", "test_20_curve", "test_60_point", "percent_total")

有没有办法只返回向量中的"20"和"60"？

我目前有

gsub(".*([0-9]{2}).*", "\\1", example)

哪一项输出

[1] "positive_1"    "positive_2"    "20"            "60"            "percent_total"

我想知道是否有一种方法可以使任何没有两位数的值都显示为NA.

提前谢谢！

`stringr::str_extract` approach

您可以使用

example <- c("positive_1", "positive_2", "test_20_curve", "test_60_point", "percent_total")
library(stringr)
str_extract(example, "(?<!\\d)\\d{2}(?!\\d)")
## => [1] NA   NA   "20" "60" NA

请看R demo.Note:str_extract提取模式的first个匹配项.如果您需要最后一个，请使用library(stringi)，然后使用stri_extract_last_regex(example, "(?<!\\d)\\d{2}(?!\\d)").

Details:

(?<!\d)-紧靠左侧，不能有数字
\d{2} -两位数
(?!\d)--后面不会紧跟另一个数字.

`sub` approach

example <- c("positive_1", "positive_2", "test_20_curve", "test_60_point", "percent_total")
res <- sub("^(?:(?:.*\\D)?(\\d{2})(?:\\D.*)?|.+)$", "\\1", example)
res <- res[nzchar(res)]
res
## => [1] "20" "60"

请看R demo.

Pattern details个

^-字符串的开始
(?: - either of the two alternatives:
- |.+)
- (?:.*\D)?-任意非数字字符的可选序列，然后是字符串的其余部分
- (\d{2})-第一组(替换图案中的\1指值):两位
- (?:\D.*)?-任意非数字字符的可选序列，然后是字符串的其余部分
| - or
- .+-尽可能多地使用一个或多个字符
)-外部分组的末尾(以便任一模式部分可以匹配整个字符串)
$-字符串末尾.