我的数据如下:

dat <- structure(list(rn = c("A", "B", "C", 
"D", "E"), `[0,25)` = c("40 (replaced)", 
"52 (replaced)", "5", "2", "5 (replaced)"), `[25,50)` = c("0 (replaced)", 
"0 (replaced)", "0 (replaced)", "0 (replaced)", "0 (replaced)"), `[25,100)` = c("5", 
"3", "38", "2", "1"), `[50,100)` = c("0 (replaced)", "0 (replaced)", 
"0 (replaced)", "0 (replaced)", "0 (replaced)")), row.names = c(NA, 
-5L), class = c("data.table", "data.frame"))

   rn        [0,25)      [25,50) [25,100)     [50,100)
1:  A 40 (replaced) 0 (replaced)        5 0 (replaced)
2:  B 52 (replaced) 0 (replaced)        3 0 (replaced)
3:  C             5 0 (replaced)       38 0 (replaced)
4:  D             2 0 (replaced)        2 0 (replaced)
5:  E  5 (replaced) 0 (replaced)        1 0 (replaced)

我可以简单地按如下方式得出这些数字:

    dat <- t(apply(dat, 1, extract_numeric))
    dat <- as.data.frame(dat )
    dat <- dat %>% 
        rowwise() %>% 
        summarise(V1 = V1, freq =list(c_across(-V1))) %>% 
        rowwise() %>% 
        mutate(freq = list(freq[which(freq > 0)]))

dat_out <- structure(list(V1 = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_), freq = list(c(40, 5), c(52, 3), c(5, 38), c(2, 2), 
    c(5, 1))), class = c("rowwise_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -5L), groups = structure(list(.rows = structure(list(
    1L, 2L, 3L, 4L, 5L), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame")))

enter image description here

但是如果我想保留文本,我应该怎么做呢?

所需输出:

freq
c("40 (replaced)","5")
c("52 (replaced)","3")
c("5","38")
c("2","2")
c("5 (replaced)","1")

推荐答案

将"value"列中具有"0"值的行中的pivot_longerfilter行与regex匹配,然后将list中的"value"元素按"rn"、"summarise"分组,然后将其reshape 为"long"格式,这可能会更容易

library(dplyr)
library(tidyr)
library(stringr)
out <- dat %>% 
   pivot_longer(cols = -rn) %>% 
   filter(str_detect(value, '\\b0\\b', negate = TRUE)) %>% 
   group_by(rn) %>% 
   summarise(freq = list(value), .groups = 'drop')

-输出

> out
# A tibble: 5 × 2
  rn    freq     
  <chr> <list>   
1 A     <chr [2]>
2 B     <chr [2]>
3 C     <chr [2]>
4 D     <chr [2]>
5 E     <chr [2]>
> out$freq
[[1]]
[1] "40 (replaced)" "5"            

[[2]]
[1] "52 (replaced)" "3"            

[[3]]
[1] "5"  "38"

[[4]]
[1] "2" "2"

[[5]]
[1] "5 (replaced)" "1"        

或者另一个选项是将列元素从0到NA设置为replace,然后将unite设置为单个列,指定na.rm = TRUE,如果需要,将其拆分为list,并在分隔符,上设置strsplit

dat %>% 
   mutate(across(-rn, ~ replace(.x,
        str_detect(.x, '\\b0\\b'), NA_character_))) %>% 
   unite(freq, -rn, na.rm = TRUE, sep=",") %>% 
   mutate(freq = strsplit(freq, ","))
       rn            freq
   <char>          <list>
1:      A 40 (replaced),5
2:      B 52 (replaced),3
3:      C            5,38
4:      D             2,2
5:      E  5 (replaced),1

R相关问答推荐

使用%in%时如何应用多个条件?

使用long()在dØr中过滤后获取元素数量

ggplot 2中的地块底图(basemaps_gglayer()不起作用)

在不安装软件包的情况下测试更新

当两个图层映射到相同的美学时,隐藏一个图层的图例值

如何将移除事件分配给动态创建的按钮?

将嵌套列表子集化为嵌套列表

制作等距离的线串副本

如何自定义3D散点图的图例顺序?

Ggplot2中的重复注记

在数组索引上复制矩阵时出错

如何同时从多个列表中获取名字?

如何将R中数据帧中的任何Nas替换为最后4个值

给定开始日期和月份(数字),如何根据R中的开始日期和月数创建日期列

如何移除GGPlot中超出与面相交的任何格网像元

使用不同的定性属性定制主成分分析中点的 colored颜色 和形状

ggplot R:X,Y,Z使用固定/等距的X,Y坐标绘制六边形热图

如何筛选截止年份之前最后一个测量年度的所有观测值以及截止年份之后所有年份的所有观测值

错误包arrowR:READ_PARQUET/OPEN_DATASET&QOT;无法反序列化SARIFT:TProtocolException:超出大小限制&Quot;

在使用SliderInput In Shiny(R)设置输入数据的子集时,保留一些情节痕迹