经过一段时间的研究,并try 了Sub或gSub,我并没有找到我想要的.

输入:

structure(list(submitter_id = c("TCGA-B6-A0RH-01A-21R-A115-07", 
"TCGA-BH-A1FU-11A-23R-A14D-07", "TCGA-BH-A1FU-01A-11R-A14D-07", 
"TCGA-AR-A0TX-01A-11R-A084-07", "TCGA-A1-A0SE-01A-11R-A084-07", 
"TCGA-BH-A1FC-11A-32R-A13Q-07", "TCGA-OL-A5D6-01A-21R-A27Q-07", 
"TCGA-E2-A1IK-01A-11R-A144-07", "TCGA-AC-A2FM-11B-32R-A19W-07", 
"TCGA-AN-A0FT-01A-11R-A034-07"), sample_type = c("Primary Tumor", 
"Solid Tissue Normal", "Primary Tumor", "Primary Tumor", "Metastatic", 
"Solid Tissue Normal", "Primary Tumor", "Primary Tumor", "Solid Tissue Normal", 
"Primary Tumor")), row.names = c(NA, 10L), class = "data.frame")

我想做的是,如果字符串中存在"肿瘤"和"正常",则仅保留"肿瘤"和"正常",并删除所有内容.此外,我只想 Select 由"肿瘤"和"正常"组成的行.

预期输出:

structure(list(submitter_id = c("TCGA-B6-A0RH-01A-21R-A115-07", 
"TCGA-BH-A1FU-11A-23R-A14D-07", "TCGA-BH-A1FU-01A-11R-A14D-07", 
"TCGA-AR-A0TX-01A-11R-A084-07", "TCGA-BH-A1FC-11A-32R-A13Q-07", 
"TCGA-OL-A5D6-01A-21R-A27Q-07", "TCGA-E2-A1IK-01A-11R-A144-07", 
"TCGA-AC-A2FM-11B-32R-A19W-07", "TCGA-AN-A0FT-01A-11R-A034-07"
), sample_type = c("Tumor", "Normal", "Tumor", "Tumor", "Normal", 
"Tumor", "Tumor", "Normal", "Tumor")), row.names = c(NA, 9L), class = "data.frame")

谢谢你

我try 了gSub或Sub和substra,但由于字符长度不同而失败.

推荐答案

library(tidyverse)

df <- structure(list(submitter_id = c(
  "TCGA-B6-A0RH-01A-21R-A115-07",
  "TCGA-BH-A1FU-11A-23R-A14D-07", "TCGA-BH-A1FU-01A-11R-A14D-07",
  "TCGA-AR-A0TX-01A-11R-A084-07", "TCGA-A1-A0SE-01A-11R-A084-07",
  "TCGA-BH-A1FC-11A-32R-A13Q-07", "TCGA-OL-A5D6-01A-21R-A27Q-07",
  "TCGA-E2-A1IK-01A-11R-A144-07", "TCGA-AC-A2FM-11B-32R-A19W-07",
  "TCGA-AN-A0FT-01A-11R-A034-07"
), sample_type = c(
  "Primary Tumor",
  "Solid Tissue Normal", "Primary Tumor", "Primary Tumor", "Metastatic",
  "Solid Tissue Normal", "Primary Tumor", "Primary Tumor", "Solid Tissue Normal",
  "Primary Tumor"
)), row.names = c(NA, 10L), class = "data.frame")

df |>
  mutate(sample_type = str_extract(sample_type, c("Tumor|Normal"))) |>
  drop_na(sample_type)
#>                   submitter_id sample_type
#> 1 TCGA-B6-A0RH-01A-21R-A115-07       Tumor
#> 2 TCGA-BH-A1FU-11A-23R-A14D-07      Normal
#> 3 TCGA-BH-A1FU-01A-11R-A14D-07       Tumor
#> 4 TCGA-AR-A0TX-01A-11R-A084-07       Tumor
#> 5 TCGA-BH-A1FC-11A-32R-A13Q-07      Normal
#> 6 TCGA-OL-A5D6-01A-21R-A27Q-07       Tumor
#> 7 TCGA-E2-A1IK-01A-11R-A144-07       Tumor
#> 8 TCGA-AC-A2FM-11B-32R-A19W-07      Normal
#> 9 TCGA-AN-A0FT-01A-11R-A034-07       Tumor

创建于2024年4月13日,共有reprex v2.1.0

R相关问答推荐

Tidyverse/Djirr为从嵌套列表中提取的列名赋值的解决方案

基于shiny 应用程序中的日期范围子集xts索引

更改Heatmap Annotation对象的名称

x[[1]]中的错误:脚注越界

ggplot的轴标签保存在officer中时被剪切

用预测NLS处理R中生物学假设之上的误差传播

筛选出以特定顺序患病的个体

在for循环中转换rabrame

如何使用ggplot对堆叠条形图进行嵌套排序?

线性模型斜率在减少原始数据时提供NA

合并后返回列表的数据帧列表

从多个可选列中选取一个值到一个新列中

多元正态分布的计算

如果条件匹配,则使用Mariate粘贴列名

在ggplot2图表中通过端点连接点

如何使用grepl()在数据帧列表中 Select 特定字符串?

分隔日期格式为2020年7月1日

如何将字符类对象中的数据转换为R中的字符串

我怎么才能把一盘棋变成一盘棋呢?

Package emMeans:如果emmip模型中包含的变量较少,emMeans模型中的其他变量设置为什么?