我有以下Tibble:

mydata <- structure(list(Nr = 1:10, sgv = c(72L, 72L, 68L, 62L, 83L, 83L, 
86L, 86L, 85L, 85L), Date = structure(c(1605969695, 1605969700.306, 
1605970000.593, 1605970300.593, 1605970595, 1605970600.594, 1605970895, 
1605970900.417, 1605971195, 1605971200.243), tzone = "CET", class = c("POSIXct", 
"POSIXt")), Year = c(2020, 2020, 2020, 2020, 2020, 2020, 2020, 
2020, 2020, 2020), Weekday = c(7, 7, 7, 7, 7, 7, 7, 7, 7, 7), 
    Week = c(47, 47, 47, 47, 47, 47, 47, 47, 47, 47), mmol = c(3.996, 
    3.996, 3.774, 3.441, 4.6065, 4.6065, 4.773, 3.8, 4.7175, 
    4.7175), check_time = structure(c(294.695000171661, 5.30599999427795, 
    300.286999940872, 300, 294.40700006485, 5.5939998626709, 
    294.406000137329, 5.41700005531311, 294.582999944687, 5.24300003051758
    ), class = "difftime", units = "secs"), below = c(FALSE, 
    FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE
    )), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

# A tibble: 10 × 9
      Nr   sgv Date                 Year Weekday  Week  mmol check_time   below
   <int> <int> <dttm>              <dbl>   <dbl> <dbl> <dbl> <drtn>       <lgl>
 1     1    72 2020-11-21 15:41:35  2020       7    47  4.00 294.695 secs FALSE
 2     2    72 2020-11-21 15:41:40  2020       7    47  4.00   5.306 secs FALSE
 3     3    68 2020-11-21 15:46:40  2020       7    47  3.77 300.287 secs TRUE 
 4     4    62 2020-11-21 15:51:40  2020       7    47  3.44 300.000 secs TRUE 
 5     5    83 2020-11-21 15:56:35  2020       7    47  4.61 294.407 secs FALSE
 6     6    83 2020-11-21 15:56:40  2020       7    47  4.61   5.594 secs FALSE
 7     7    86 2020-11-21 16:01:35  2020       7    47  4.77 294.406 secs FALSE
 8     8    86 2020-11-21 16:01:40  2020       7    47  3.8    5.417 secs TRUE 
 9     9    85 2020-11-21 16:06:35  2020       7    47  4.72 294.583 secs FALSE
10    10    85 2020-11-21 16:06:40  2020       7    47  4.72   5.243 secs FALSE

我的目标是计算每组真值的总时间(CHECK_TIME之和).我的数据框中大约有600000行,真实值以1、2、3或更多为一组出现. 为此,我想用一个标识符对True值进行编号,其中所有分组的True值都具有相同的标识符.上面的示例应该如下所示:

      Nr   sgv Date                 Year Weekday  Week  mmol check_time   below    ID
   <int> <int> <dttm>              <dbl>   <dbl> <dbl> <dbl> <drtn>       <lgl> <dbl>
 1     1    72 2020-11-21 15:41:35  2020       7    47  4.00 294.695 secs FALSE    NA
 2     2    72 2020-11-21 15:41:40  2020       7    47  4.00   5.306 secs FALSE    NA
 3     3    68 2020-11-21 15:46:40  2020       7    47  3.77 300.287 secs TRUE      1
 4     4    62 2020-11-21 15:51:40  2020       7    47  3.44 300.000 secs TRUE      1
 5     5    83 2020-11-21 15:56:35  2020       7    47  4.61 294.407 secs FALSE    NA
 6     6    83 2020-11-21 15:56:40  2020       7    47  4.61   5.594 secs FALSE    NA
 7     7    86 2020-11-21 16:01:35  2020       7    47  4.77 294.406 secs FALSE    NA
 8     8    86 2020-11-21 16:01:40  2020       7    47  3.8    5.417 secs TRUE      2
 9     9    85 2020-11-21 16:06:35  2020       7    47  4.72 294.583 secs FALSE    NA
10    10    85 2020-11-21 16:06:40  2020       7    47  4.72   5.243 secs FALSE    NA

推荐答案

以下是使用Base R rle的一个选项:

transform(mydata, ID = replace(with(rle(below), rep(cumsum(values), lengths)), !below, NA))

#   Nr sgv                Date Year Weekday Week   mmol   check_time below ID
#1   1  72 2020-11-21 15:41:35 2020       7   47 3.9960 294.695 secs FALSE NA
#2   2  72 2020-11-21 15:41:40 2020       7   47 3.9960   5.306 secs FALSE NA
#3   3  68 2020-11-21 15:46:40 2020       7   47 3.7740 300.287 secs  TRUE  1
#4   4  62 2020-11-21 15:51:40 2020       7   47 3.4410 300.000 secs  TRUE  1
#5   5  83 2020-11-21 15:56:35 2020       7   47 4.6065 294.407 secs FALSE NA
#6   6  83 2020-11-21 15:56:40 2020       7   47 4.6065   5.594 secs FALSE NA
#7   7  86 2020-11-21 16:01:35 2020       7   47 4.7730 294.406 secs FALSE NA
#8   8  86 2020-11-21 16:01:40 2020       7   47 3.8000   5.417 secs  TRUE  2
#9   9  85 2020-11-21 16:06:35 2020       7   47 4.7175 294.583 secs FALSE NA
#10 10  85 2020-11-21 16:06:40 2020       7   47 4.7175   5.243 secs FALSE NA

解释-

对于rle,我们创建每TRUE个值递增的连续数字.

a <- with(rle(mydata$below), rep(cumsum(values), lengths))
a
#[1] 0 0 1 1 1 1 1 2 2 2

因为我们希望FALSE值为NA,所以我们使用replace

replace(a, !mydata$below, NA)
#[1] NA NA  1  1 NA NA NA  2 NA NA

R相关问答推荐

使用格式化程序自定义hc_tooltip以添加textColor删除了我的标记并try 将它们带回失败

以R表示的gglikert地块调整总数

过滤器数据.基于两列的帧行和R中的外部向量

在数学中正确显示摄氏度、开氏度或华氏度

gganimate在使用shadow_mark选项时不保留所有过go 的标记

如何在ggplot中标记qqplot上的点?

如何在Chart_Series()中更改轴值的 colored颜色 ?

根据日期从参考帧中创建不同的帧

如何使用ggplot对堆叠条形图进行嵌套排序?

R中1到n_1,2到n_2,…,n到n_n的所有组合都是列表中的向量?

R中有约束的优化问题:如何用复数和对数效益函数解决问题?

将多个列值转换为二进制

正则表达式在第二个管道和第二个T之后拆分R中的列

如何在ggplot2中创建多个y轴(每个变量一个)

使用ggplot2中的sec_axis()调整次轴

R -基线图-图形周围的阴影区域

隐藏基于 case 总数的值

将边列表转换为路径长度列表

在一个multiplot中以非对称的方式在R中绘制多个图

从字符串列中的向量中查找第一个匹配的单词