我在争论一个有交叉试验设计的数据集.下面是一个 struct 类似的玩具示例:
df <- structure(list(subject = c("a", "a", "a", "a", "a", "a", "b",
"b", "b", "b", "c", "c", "c", "c", "c", "c"), treatment = c("none",
"placebo", "placebo", "drug", "drug", "drug", "none", "drug",
"placebo", "placebo", "none", "placebo", "drug", "drug", "drug",
"drug"), day = c(0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 0, 1, 2, 3, 4,
5)), row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"
))
# A tibble: 16 × 3
subject treatment day
<chr> <chr> <dbl>
1 a none 0
2 a placebo 1
3 a placebo 2
4 a drug 3
5 a drug 4
6 a drug 5
7 b none 0
8 b drug 1
9 b placebo 2
10 b placebo 3
11 c none 0
12 c placebo 1
13 c drug 2
14 c drug 3
15 c drug 4
16 c drug 5
因此,每个受试者都从treatment
中的"无"值开始,然后进行几天的placebo
或drug
治疗,然后再进行几天的其他治疗.
所以我想要的输出是这样的:
# A tibble: 16 × 4
subject treatment day stage
<chr> <chr> <dbl> <chr>
1 a none 0 first
2 a placebo 1 second
3 a placebo 2 second
4 a drug 3 third
5 a drug 4 third
6 a drug 5 third
7 b none 0 first
8 b drug 1 second
9 b placebo 2 third
10 b placebo 3 third
11 c none 0 first
12 c placebo 1 second
13 c drug 2 third
14 c drug 3 third
15 c drug 4 third
16 c drug 5 third
对我来说有意义的是使用group_by
和mutate
的组合以及treatment
中的factor
,这是不起作用的
#my failed attempt
df %>%
arrange(subject, day) %>% #needed for my actual dataset
group_by(subject) %>%
mutate(stage=factor(treatment, levels=c("first", "second", "third"))) %>%
ungroup(
它给出:
# A tibble: 16 × 4
subject treatment day stage
<chr> <chr> <dbl> <fct>
1 a none 0 second
2 a placebo 1 third
3 a placebo 2 third
4 a drug 3 first
5 a drug 4 first
6 a drug 5 first
7 b none 0 second
8 b drug 1 first
9 b placebo 2 third
10 b placebo 3 third
11 c none 0 second
12 c placebo 1 third
13 c drug 2 first
14 c drug 3 first
15 c drug 4 first
16 c drug 5 first
问题是标签是根据"治疗"值的字母顺序显示的,但我希望它们按照每个受试者内treatment
个值的出现顺序显示.我也试过用levels
而不是labels
,我只得到了全部NA
.
任何帮助都将不胜感激.dplyr
解决方案是首选,但很乐意与任何其他解决方案合作.