管道 stringr str_detect 到 str_extract 的问题提取仅从第一行提取文本：参数不是原子向量；胁迫

发布于08月08日

我正在try 创建一个仅包含来自表达式的特定数字数据的新列.

我只需要第12栏中双极之后的数字.

以下是有效的方法

p <- df %>% 
      select(where(~ any(stringr::str_detect(.x, "Bipolar")))) #returns correct column

当我try 创建一个只拉出文本的新列时，它总是只返回第一行，不确定我做错了什么.

p %>%
      mutate(group = "sr_bipol",
             sr_bipol = as.numeric(stringr::str_extract(., "[0-9].[0-9]+"))) %>% 
       select(group, sr_bipol)

# A tibble: 20 × 2
   group    sr_bipol
   <chr>       <dbl>
 1 sr_bipol     7.83
 2 sr_bipol     7.83
 3 sr_bipol     7.83
 4 sr_bipol     7.83
 5 sr_bipol     7.83
.....................

我还得到了错误代码:

 argument is not an atomic vector; coercing

推荐答案

.指的是整个数据集(str_extract需要一个向量作为输入，而不是一个data.Frame).根据?str_extract

字符串输入向量.要么是一个字符向量，要么是对一个人具有强制性的东西.

我们可能需要对第12列应用str_extract.由于12前缀的列名包括...，这是不常见的列名，请使用反号来访问列值

library(dplyr)
library(stringr)
df %>% 
  transmute(group = 'sr_bipol', 
    sr_bipol = as.numeric(str_extract(`...12`, "(?<=Bipolar\\s)[0-9]\\.[0-9]+")))

-输出

# A tibble: 20 × 2
   group    sr_bipol
   <chr>       <dbl>
 1 sr_bipol     7.83
 2 sr_bipol     2.34
 3 sr_bipol     1.97
 4 sr_bipol     1.94
 5 sr_bipol     2.85
 6 sr_bipol     2.92
 7 sr_bipol     3.05
 8 sr_bipol     2.80
 9 sr_bipol     3.43
10 sr_bipol     2.11
11 sr_bipol     2.80
12 sr_bipol     1.81
13 sr_bipol     1.84
14 sr_bipol     3.87
15 sr_bipol     1.68
16 sr_bipol     2.21
17 sr_bipol     2.97
18 sr_bipol     3.09
19 sr_bipol     2.84
20 sr_bipol     3.48

p数据是单个列tibble/data.frame.当我们使用.时，它 Select 数据.

> str(p)
tibble [20 × 1] (S3: tbl_df/tbl/data.frame)
 $ ...12: chr [1:20] "Bipolar 7.827 / Unipolar 16.911 / LAT -9.0" "Bipolar 2.34 / Unipolar 9.09 / LAT -10.0" "Bipolar 1.974 / Unipolar 9.219 / LAT -11.0" "Bipolar 1.938 / Unipolar 10.572 / LAT -9.0" ...
> str_extract(p, "[0-9].[0-9]+")
[1] "7.827"
Warning message:
In stri_extract_first_regex(string, pattern, opts_regex = opts(pattern)) :
  argument is not an atomic vector; coercing

它从第一个实例中提取值，并对其进行循环以创建整个7.8列

如果有多个列具有‘双极’，我们可以循环across(如果我们想保留原始数据中的所有其他列，则将transmute修改为mutate)

df %>% 
  transmute(across(where(~ any(stringr::str_detect(.x, "Bipolar"))), 
   ~ as.numeric(str_extract(.x, "(?<=Bipolar\\s)[0-9]\\.[0-9]+")), 
     .names = "sr_bipol{str_remove(.col, '[.]+')}"))
# A tibble: 20 × 1
   sr_bipol12
        <dbl>
 1       7.83
 2       2.34
 3       1.97
 4       1.94
 5       2.85
 6       2.92
 7       3.05
 8       2.80
 9       3.43
10       2.11
11       2.80
12       1.81
13       1.84
14       3.87
15       1.68
16       2.21
17       2.97
18       3.09
19       2.84
20       3.48