我有一个dataframe列,它有一个字符串,可能包含几个空格.我想在第一次出现关键字(即样本数据中的fruit_key
)后,在空格中使用tidyr
中的separate
(或类似的东西),这样我就可以将一列分隔为两列.
Sample Data
df <- structure(list(fruit = c("Apple Orange Pineapple", "Plum Good Watermelon",
"Plum Good Kiwi", "Plum Good Plum Good", "Cantaloupe Melon", "Blueberry Blackberry Cobbler",
"Peach Pie Apple Pie")), class = "data.frame", row.names = c(NA,
-7L))
fruit_key <- c("Apple", "Plum Good", "Cantaloupe", "Blueberry", "Peach Pie")
Expected Output
fruit Delicious Tasty
1 Apple Orange Pineapple Apple Orange Pineapple
2 Plum Good Watermelon Plum Good Watermelon
3 Plum Good Kiwi Plum Good Kiwi
4 Plum Good Plum Good Plum Good Plum Good
5 Cantaloupe Melon Cantaloupe Melon
6 Blueberry Blackberry Cobbler Blueberry Blackberry Cobbler
7 Peach Pie Apple Pie Peach Pie Apple Pie
我可以将关键字后面的部分(separate
)放入正确的列(即Tasty
),但无法将实际关键字返回到另一列(即Delicious
).我try 了多次修改正则表达式,但始终无法得到正确的输出.
library(tidyr)
separate(df, fruit,
c("Delicious", "Tasty"),
sep = paste(fruit_key, collapse = "|"),
extra = "merge",
remove = FALSE
)
# fruit Delicious Tasty
#1 Apple Orange Pineapple Orange Pineapple
#2 Plum Good Watermelon Watermelon
#3 Plum Good Kiwi Kiwi
#4 Plum Good Plum Good Plum Good
#5 Cantaloupe Melon Melon
#6 Blueberry Blackberry Cobbler Blackberry Cobbler
#7 Peach Pie Apple Pie Apple Pie
我知道我可以使用str_extract
和str_remove
(如下所示),但我想用separate
这样的东西在一个函数/步骤中完成它.
library(tidyverse)
df %>%
mutate(Delicious = str_extract(fruit, paste(fruit_key, collapse = "|")),
Tasty = str_remove(fruit, paste(fruit_key, collapse = "|")))