我有这些(不整齐的)数据,其中包含每个患者的药物方案阶段(ip或cp)、药物名称(编码为数字)和多种药物的剂量信息:
df_have
# id ip_drug1 ip_dose1 ip_drug2 ip_dose2 cp_drug1 cp_dose1 cp_drug2 cp_dose2
# 1 A1 1 300 3 100 6 500 7 100
# 2 A2 1 300 2 200 11 300 NA NA
# 3 A3 1 500 NA NA 9 100 5 1500
我想把这些数据整理成长格式:
df_want
# id phase drug dose
# 1 A1 ip 1 300
# 2 A1 ip 3 100
# 3 A1 cp 6 500
# 4 A1 cp 7 100
# 5 A2 ip 1 300
# 6 A2 ip 2 200
# 7 A2 cp 11 300
# 8 A2 cp NA NA
# 9 A3 ip 1 500
# 10 A3 ip NA NA
# 11 A3 cp 9 100
# 12 A3 cp 5 1500
通过组合tidyr::pivot_longer
、dplyr::mutate
和tidyr::pivot_wider
(以及dplyr::select
),我能够获得所需的数据帧:
library(tidyr)
library(dplyr)
df_have %>%
pivot_longer(cols = -id,
names_to = c("phase", "type"),
names_pattern = "(cp|ip)_(drug|dose)") %>%
mutate(temp = row_number(),
.by = c(id, phase, type)) %>%
pivot_wider(names_from = type,
values_from = value) %>%
select(-temp)
然而,上面的多步代码在我的非常大的实际数据上非常慢.我想在tidyr
/dplyr
,ideally in a single 102 step.内更快地完成这一转变,这可能吗?
一百零二
# have
df_have <- data.frame(id = paste0("A", 1:3),
ip_drug1 = 1,
ip_dose1 = c(300, 300, 500),
ip_drug2 = c(3, 2, NA),
ip_dose2 = c(100, 200, NA),
cp_drug1 = c(6, 11, 9),
cp_dose1 = c(500, 300, 100),
cp_drug2 = c(7, NA, 5),
cp_dose2 = c(100, NA, 1500))
# want
df_want <- data.frame(id = rep(paste0("A", 1:3), each = 4),
phase = rep(rep(c("ip", "cp"), each = 2), times = 3),
drug = c(1, 3, 6, 7, 1, 2, 11, NA, 1, NA, 9, 5),
dose = c(300, 100, 500, 100, 300, 200, 300, NA, 500, NA, 100, 1500))