我有以下名为df(以下为dput)的数据帧:

# A tibble: 14 × 5
   group date                indicator value diff_hours
   <chr> <dttm>              <lgl>     <dbl>      <dbl>
 1 A     2022-11-01 01:00:00 FALSE         2          4
 2 A     2022-11-01 02:00:00 FALSE         1          3
 3 A     2022-11-01 03:00:00 FALSE         4          2
 4 A     2022-11-01 04:00:00 FALSE         1          1
 5 A     2022-11-01 05:00:00 TRUE          3          0
 6 A     2022-11-01 06:00:00 FALSE         1          1
 7 A     2022-11-01 07:00:00 FALSE         3          2
 8 B     2022-11-01 01:00:00 FALSE         1          4
 9 B     2022-11-01 02:00:00 FALSE         2          3
10 B     2022-11-01 03:00:00 FALSE         3          2
11 B     2022-11-01 04:00:00 FALSE         1          1
12 B     2022-11-01 05:00:00 TRUE          4          0
13 B     2022-11-01 06:00:00 FALSE         1          1
14 B     2022-11-01 07:00:00 FALSE         5          2

我想计算相对于条件行indicator == TRUE每n行的斜率(lm(value ~ diff_hours)).具有TRUE的行的斜率应为NA.下面是名为df_desired的所需输出,其中n=2(见下图dput):

# A tibble: 14 × 6
# Groups:   group [2]
   group date                indicator value diff_hours slope
   <chr> <dttm>              <lgl>     <dbl>      <dbl> <dbl>
 1 A     2022-11-01 01:00:00 FALSE         2          4     1
 2 A     2022-11-01 02:00:00 FALSE         1          3     1
 3 A     2022-11-01 03:00:00 FALSE         4          2     3
 4 A     2022-11-01 04:00:00 FALSE         1          1     3
 5 A     2022-11-01 05:00:00 TRUE          3          0    NA
 6 A     2022-11-01 06:00:00 FALSE         1          1     2
 7 A     2022-11-01 07:00:00 FALSE         3          2     2
 8 B     2022-11-01 01:00:00 FALSE         1          4    -1
 9 B     2022-11-01 02:00:00 FALSE         2          3    -1
10 B     2022-11-01 03:00:00 FALSE         3          2     2
11 B     2022-11-01 04:00:00 FALSE         1          1     2
12 B     2022-11-01 05:00:00 TRUE          4          0    NA
13 B     2022-11-01 06:00:00 FALSE         1          1     4
14 B     2022-11-01 07:00:00 FALSE         5          2     4

例如,第1行和第2行的斜率为lm(c(2,1)~c(4,3))=1.所以我想知道,是否有人知道如何计算每n行相对于每组条件行的斜率?


Df和df_desired的dput:

df <- structure(list(group = c("A", "A", "A", "A", "A", "A", "A", "B", 
"B", "B", "B", "B", "B", "B"), date = structure(c(1667260800, 
1667264400, 1667268000, 1667271600, 1667275200, 1667278800, 1667282400, 
1667260800, 1667264400, 1667268000, 1667271600, 1667275200, 1667278800, 
1667282400), class = c("POSIXct", "POSIXt"), tzone = ""), indicator = c(FALSE, 
FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, TRUE, FALSE, FALSE), value = c(2, 1, 4, 1, 3, 1, 3, 1, 
2, 3, 1, 4, 1, 5), diff_hours = c(4, 3, 2, 1, 0, 1, 2, 4, 3, 
2, 1, 0, 1, 2)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -14L), groups = structure(list(group = c("A", 
"B"), .rows = structure(list(1:7, 8:14), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L), .drop = TRUE))

df_desired <- structure(list(group = c("A", "A", "A", "A", "A", "A", "A", "B", 
"B", "B", "B", "B", "B", "B"), date = structure(c(1667260800, 
1667264400, 1667268000, 1667271600, 1667275200, 1667278800, 1667282400, 
1667260800, 1667264400, 1667268000, 1667271600, 1667275200, 1667278800, 
1667282400), class = c("POSIXct", "POSIXt"), tzone = ""), indicator = c(FALSE, 
FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, 
FALSE, TRUE, FALSE, FALSE), value = c(2, 1, 4, 1, 3, 1, 3, 1, 
2, 3, 1, 4, 1, 5), diff_hours = c(4, 3, 2, 1, 0, 1, 2, 4, 3, 
2, 1, 0, 1, 2), slope = c(1, 1, 3, 3, NA, 2, 2, -1, -1, 2, 2, 
NA, 4, 4)), row.names = c(NA, -14L), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), groups = structure(list(group = c("A", 
"B"), .rows = structure(list(1:7, 8:14), ptype = integer(0), class = c("vctrs_list_of", 
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -2L), .drop = TRUE))

推荐答案

不需要使用lagcumsum;使用rep就足够了.

library(dplyr)

N <- 2

df %>% 
  ungroup() %>% 
  group_by(indicator) %>% 
  mutate(grp = rep(1:((n()/N)), each = N)) %>% 
  group_by(indicator, grp) %>% 
  mutate(slope = lm(c(value) ~ c(diff_hours))$coefficients[[2]])
#> # A tibble: 14 x 7
#> # Groups:   indicator, grp [7]
#>    group date                indicator value diff_hours   grp slope
#>    <chr> <dttm>              <lgl>     <dbl>      <dbl> <int> <dbl>
#>  1 A     2022-10-31 20:00:00 FALSE         2          4     1  1   
#>  2 A     2022-10-31 21:00:00 FALSE         1          3     1  1   
#>  3 A     2022-10-31 22:00:00 FALSE         4          2     2  3   
#>  4 A     2022-10-31 23:00:00 FALSE         1          1     2  3   
#>  5 A     2022-11-01 00:00:00 TRUE          3          0     1 NA   
#>  6 A     2022-11-01 01:00:00 FALSE         1          1     3  2   
#>  7 A     2022-11-01 02:00:00 FALSE         3          2     3  2   
#>  8 B     2022-10-31 20:00:00 FALSE         1          4     4 -1.00
#>  9 B     2022-10-31 21:00:00 FALSE         2          3     4 -1.00
#> 10 B     2022-10-31 22:00:00 FALSE         3          2     5  2   
#> 11 B     2022-10-31 23:00:00 FALSE         1          1     5  2   
#> 12 B     2022-11-01 00:00:00 TRUE          4          0     1 NA   
#> 13 B     2022-11-01 01:00:00 FALSE         1          1     6  4   
#> 14 B     2022-11-01 02:00:00 FALSE         5          2     6  4

R相关问答推荐

R等效于LABpascal(n,1)不同的列符号

R Tidymodels textercipes-使用spacyR进行标记化-如何从生成的标记列表中删除标点符号

获取一个数据库框架的摘要,该数据库框架将包含一列数据库框架,

如何编辑ggplot的图例字使用自定义对象(gtable)?'

更改STAT_VALLES/STAT_PEAKS中的箭头线宽/大小

从多面条形图中删除可变部分

为什么我对圆周率图的蒙特卡罗估计是空的?

ggplot R:X,Y,Z使用固定/等距的X,Y坐标绘制六边形热图

防止正则表达式覆盖以前的语句

将文本批注减少到gglot的y轴上的单个值

如何在条形图中的x和填充变量中包含多个响应变量?

Data.table::Shift type=允许扩展数据(&Q;LAG&Q;)

通过匹配另一个表(查找表)中的列值来填充数据表,并在另一个变量上进行内插

将R中对象的CSV数组转换为JSON数组

如何在基数R中根据矩阵散点图中的因子给数据上色?

当y为负值时,无法使stat_cor正确定位到底部?

残差与拟合图上标记点的故障排除

用从先前非NA值开始的递增序列替换NA值

如何 suppress 条形图中的零条?

基于日期输入的子集数据集,其中应包括NAS作为 Select