在我下面的陈述中:
-
RSA
是要分析的过程的输出,其结果将被分组. - 每
RSA
组有不同的观察天数范围(datenum
天). -
var1
的变化频率较低,但每一次都连续观察8天. -
RSA
组将在var1
组内按顺序编号;当遇到新的var1
时,RSA
组编号重新开始. -
idx_objective
是我要找的指数.
Reprex:
var1 <- c("aaa", "aaa", "aaa", "aaa", "aaa", "aaa", "aaa", "aaa", "bbb", "bbb", "bbb", "bbb", "bbb", "bbb", "bbb", "bbb", "ccc", "ccc", "ccc", "ccc", "ccc", "ccc", "ccc", "ccc", "ddd", "ddd", "ddd", "ddd", "ddd", "ddd", "ddd", "ddd")
RSA <- c(1,1,1,0,-1,-1,0,-1,
0,0,0,-1,-1,-1,1,1,
-1,-1,0,1,1,-1,-1,1,
1,-1,-1,1,1,0,-1,1)
idx_objective <- c(1,1,1,2,3,3,4,5,
1,1,1,2,2,2,3,3,
1,1,2,3,3,4,4,5,
1,2,2,3,3,4,5,6)
objective.df <- data.frame(var1, RSA, idx_objective) %>%
group_by(var1) %>%
mutate (datenum = 1:n()) %>%
relocate (datenum, .after = var1)
我审阅了许多看似相似的SO帖子.
1dplyr: group variables then assign unique names based on unique grouping
围绕着对Cumsum的正确使用,我认为我是正确使用的
[https://stackoverflow.com/questions/40519129/how-to-assign-unique-id-for-group-of-duplicates]
[2]How to divide between groups of rows using dplyr
后两个似乎不适用;另外两个在以下内容中引用:
Approach #1: using a change flag and cumsum个
objective.try1 <- objective.df %>%
group_by(var1) %>%
mutate(chg_flg = ifelse(lag(RSA) != RSA, 1, 0) %>%
coalesce(0)) %>%
relocate(chg_flg, .after = RSA) %>%
relocate (datenum, .after var1) %>%
group_by(var1, chg_flg) %>%
mutate (idx_objective_try = cumsum(chg_flg) +1) %>%
结果:
objective.try1 <- c(1, 1, 1, 2, 3, 1, 4, 5, 1, 1, 1, 2, 1, 1, 3, 1, 1, 2, 3, 1, 4, 1, 5, 1, 2, 1, 3, 1, 4, 5, 6)
objective.df <- data.frame(var1, RSA, idx_objective, objective.try1 %>%
group_by(var1) %>%
mutate (datenum = 1: n()) %>%
relocate(datenum, .after = var1)
对objective.try1
的观察:行1-5工作,但行6再次错误地重新开始idx
编号,但随后恢复正确地反映chg_flg
,直到行13和14,此时idx
编号再次错误地重新开始,但随后再次恢复对一行的正确,直到行16、21、23、27和29再次错误.
例如,按照第6行的逻辑--前idx_objective_try
(第5行)是3,第6行的chg_flg
值是0,所以idx_objecitve_try
应该是正确的值3.为什么不是呢?
Approach #2: Using match
and duplicated
:
objective.try2 <- objective.df %>%
group_by(var1) %>%. # var1 corresponds to "prop" in the SO post (both the slower moving variables)
mutate(well_rep1 = match(RSA, unique(RSA)), # "RSA" corresponds to "well" in the SO post (both the faster changing variables)
well_rep2 = cumsum(!duplicated(RSA))) # approach similar to above
objective.try2
行观察:大多数行都可以工作,但也有一些行不工作,尽管不工作的行与第一次try 时的行不同.
如果有人能指出我做错了什么,我将不胜感激.