我在争论一个有交叉试验设计的数据集.下面是一个 struct 类似的玩具示例:

df <- structure(list(subject = c("a", "a", "a", "a", "a", "a", "b", 
"b", "b", "b", "c", "c", "c", "c", "c", "c"), treatment = c("none", 
"placebo", "placebo", "drug", "drug", "drug", "none", "drug", 
"placebo", "placebo", "none", "placebo", "drug", "drug", "drug", 
"drug"), day = c(0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 0, 1, 2, 3, 4, 
5)), row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"
))
# A tibble: 16 × 3
   subject treatment   day
   <chr>   <chr>     <dbl>
 1 a       none          0
 2 a       placebo       1
 3 a       placebo       2
 4 a       drug          3
 5 a       drug          4
 6 a       drug          5
 7 b       none          0
 8 b       drug          1
 9 b       placebo       2
10 b       placebo       3
11 c       none          0
12 c       placebo       1
13 c       drug          2
14 c       drug          3
15 c       drug          4
16 c       drug          5

因此,每个受试者都从treatment中的"无"值开始,然后进行几天的placebodrug治疗,然后再进行几天的其他治疗.

所以我想要的输出是这样的:

# A tibble: 16 × 4
   subject treatment   day stage 
   <chr>   <chr>     <dbl> <chr> 
 1 a       none          0 first 
 2 a       placebo       1 second
 3 a       placebo       2 second
 4 a       drug          3 third 
 5 a       drug          4 third 
 6 a       drug          5 third 
 7 b       none          0 first 
 8 b       drug          1 second
 9 b       placebo       2 third 
10 b       placebo       3 third 
11 c       none          0 first 
12 c       placebo       1 second
13 c       drug          2 third 
14 c       drug          3 third 
15 c       drug          4 third 
16 c       drug          5 third  

对我来说有意义的是使用group_bymutate的组合以及treatment中的factor,这是不起作用的

#my failed attempt
df %>% 
  arrange(subject, day) %>% #needed for my actual dataset
  group_by(subject) %>% 
  mutate(stage=factor(treatment, levels=c("first", "second", "third"))) %>% 
  ungroup(

它给出:

# A tibble: 16 × 4
   subject treatment   day stage 
   <chr>   <chr>     <dbl> <fct> 
 1 a       none          0 second
 2 a       placebo       1 third 
 3 a       placebo       2 third 
 4 a       drug          3 first 
 5 a       drug          4 first 
 6 a       drug          5 first 
 7 b       none          0 second
 8 b       drug          1 first 
 9 b       placebo       2 third 
10 b       placebo       3 third 
11 c       none          0 second
12 c       placebo       1 third 
13 c       drug          2 first 
14 c       drug          3 first 
15 c       drug          4 first 
16 c       drug          5 first

问题是标签是根据"治疗"值的字母顺序显示的,但我希望它们按照每个受试者内treatment个值的出现顺序显示.我也试过用levels而不是labels,我只得到了全部NA.

任何帮助都将不胜感激.dplyr解决方案是首选,但很乐意与任何其他解决方案合作.

推荐答案

你可以 Select group_by个主题,然后使用matchrleid.使用english::ordinal获得预期的输出.

df %>% 
  group_by(subject) %>% 
  mutate(match = match(treatment, unique(treatment)),
         rleid = data.table::rleid(treatment),
         stage = english::ordinal(match))

# A tibble: 16 × 6
# Groups:   subject [3]
   subject treatment   day match rleid stage       
   <chr>   <chr>     <dbl> <int> <int> <ordinal>
 1 a       none          0     1     1 first    
 2 a       placebo       1     2     2 second   
 3 a       placebo       2     2     2 second   
 4 a       drug          3     3     3 third    
 5 a       drug          4     3     3 third    
 6 a       drug          5     3     3 third    
 7 b       none          0     1     1 first    
 8 b       drug          1     2     2 second   
 9 b       placebo       2     3     3 third    
10 b       placebo       3     3     3 third    
11 c       none          0     1     1 first    
12 c       placebo       1     2     2 second   
13 c       drug          2     3     3 third    
14 c       drug          3     3     3 third    
15 c       drug          4     3     3 third    
16 c       drug          5     3     3 third    

R相关问答推荐

使用预定值列表将模拟数量(n)替换为rnorm()

基于shiny 应用程序中的日期范围子集xts索引

根据R中两个变量的两个条件删除带有dspirr的行

使用R中相同值创建分组观测指标

为什么观察不会被无功值变化触发?

如何使用R对每组变量进行随机化?

如何在R中对深度嵌套的tibbles中的非空连续行求和?

R函数,用于生成伪随机二进制序列,其中同一数字在一行中不出现超过两次

提取一个列表中单个列的重复观察结果R

线性模型斜率在减少原始数据时提供NA

如何在ggplot2中绘制具有特定 colored颜色 的连续色轮

将摘要图添加到facet_WRAP gglot的末尾

为什么在写入CSV文件时Purrr::Pwalk不起作用

在不对R中的变量分组的情况下取两行的平均值

使用dqur在不同变量上创建具有多个条件的变量

根据用户输入更改标记大小和 colored颜色 (R)

在分面的ggplot2条形图中对条形图进行排序,并省略每组未使用的系数级别

如果缺少时间,如何向日期-时间列添加时间

用逗号拆分字符串,并删除一些字符

如何在用`{{ }}`创建的变量上使用整洁 Select ?