我有一个由数千行组成的单列数据框,所有行都构建在相同的模式上,例如:
ids <- c("ETC|HMPI01000001|HMPI01000001.1 TAG: Genus Species, T05X3Ml2_CL10007Cordes1_1","ETC|HMPI31000002|HMPI31000002.1 TAG: Genus Species, T3X3Ml2_CL10157Cordes1_1", "ETC|HMPI01000007|HMPI01000007.1 TAG: Genus Species, T1X3Ml2_CL11231Cordes1_1")
df <- as.data.frame(ids)
> df
ids
1 ETC|HMPI01000001|HMPI01000001.1 TAG: Genus Species, T05X3Ml2_CL10007Cordes1_1
2 ETC|HMPI31000002|HMPI31000002.1 TAG: Genus Species, T3X3Ml2_CL10157Cordes1_1
3 ETC|HMPI01000007|HMPI01000007.1 TAG: Genus Species, T1X3Ml2_CL11231Cordes1_1
我想将这些字符分成两列:var1和var2,并保留第二个管道之后和第一个空格之前的文本,以及空格之后第二个T之后的文本.这些将是所有线条的通用模式.预期结果为:
> df
var1 var2
1 HMPI01000001.1 T05X3Ml2_CL10007Cordes1_1
2 HMPI31000002.1 T3X3Ml2_CL10157Cordes1_1
3 HMPI01000007.1 T1X3Ml2_CL11231Cordes1_1
我已经try 了几个受here、there或there启发的正则表达式.但我想不出来.
我目前有这个,但它没有给出预期的结果:
df2 <- df %>% separate(col = "ids", into = c("var1", "var2"), sep = "\\|([^|]+)$")
> df2
var1 var2
1 ETC|HMPI01000001
2 ETC|HMPI31000002
3 ETC|HMPI01000007
任何帮助,最好是使用正则表达式和tidyVerse,都将不胜感激.