我有一个在data frame
中查找匹配项的函数(忽略T2行,它是"关闭的")
library(stringr)
find.all.matches <- function(search.col,pattern){
captured <- str_match_all(search.col,pattern = pattern)
t <- lapply(captured, str_trim)
#t2 <- lapply(t, function(x) gsub("[^a-z]","",x)) ##turned off
t3 <- sapply(t, unique)
t4 <- lapply(t3, toString)
found.col <- unlist(t4)
return(found.col)
}
我在一个大约20,000行的大型数据集中的特定列上运行这段代码.该栏目是科学期刊的摘要.
我使用以下代码将从pattern
中提取的单词作为新列添加到数据框中
testing2 <- find.all.matches(search.col = all_data$abstract_l,
pat = pattern)
all_data$testing_mu_m <- testing2
这是目前的模式……
pattern = '\\d+(?:[.,]\\d+)*\\s*mu m\\b|ba\\b'
在下面的摘要示例中,它将突出显示mu m
之前的所有数字以及ba
a protocol for in vitro propagation of adult lavandula dentata plants has been achieved. cultures were established by placing nodal segments on murashige and skoog medium containing ba, kin, and naa. highest shoot multiplication rates were obtained when explants grown in the presence of 5.0 mu m ba or 20 mu m kin were transferred to medium with 8.8 mu m ba and 15% coconut milk. multiplication efficiency through subcultures was significantly affected by the cytokinin concentration in the initial culture medium. subculture reduced drastically the final number of shoots produced on nodal segments isolated from shoots grown in the presence of 2.0 mu m ba or 40.0 mu m kin. shoots were easily rooted on murashige and skoog hormone-free medium with macronutrients at half-strength. plants were successfully transplanted into soil.
我在想,有没有办法把一个包含ba
个句子的完整句子抽出来呢?
我想要一台pattern
,我可以把它插到find.all.matches
功能上.
所需输出:cultures were established by placing nodal segments on murashige and skoog medium containing ba, kin, and naa
、highest shoot multiplication rates were obtained when explants grown in the presence of 5.0 mu m ba or 20 mu m kin were transferred to medium with 8.8 mu m ba and 15% coconut milk
和subculture reduced drastically the final number of shoots produced on nodal segments isolated from shoots grown in the presence of 2.0 mu m ba or 40.0 mu m kin.