我有一个这样的数据集:
structure(list(study_id = structure(c("P005", "P005", "P005",
"P008", "P008", "P008", "P021", "P021", "P021", "P028", "P028",
"P028", "P032", "P032", "P032", "P036", "P036", "P036", "P037",
"P037", "P037", "P049", "P049", "P049", "P053", "P053", "P053",
"P069", "P069", "P069", "P079", "P079", "P079", "P089", "P089",
"P089", "P093", "P093", "P093", "P096", "P096", "P096", "P104",
"P104", "P104", "P105", "P105", "P105"), label = "ISMART Study ID", format.stata = "%9s"),
phase = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("Baseline", "Midterm",
"Final"), class = "factor"), selfeff1 = structure(c(3L, 3L,
3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, NA, 3L,
3L, 3L, 3L, 3L, 3L, NA, 3L, 3L, 3L, 3L, 3L, 3L, 3L, NA, 3L,
2L), levels = c("Not confident", "Somewhat confident", "Very confident"
), class = "factor"), selfeff3 = structure(c(3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 2L, 3L, 3L, 3L,
3L, 3L, NA, 3L, 2L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, NA, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L
), levels = c("Not confident", "Somewhat confident", "Very confident"
), class = "factor")), class = "data.frame", row.names = c(NA,
-48L))
这是预算长格式数据集.每个study_id有三行基线、中期和最终值.现在我想使用结转/结转方法来估算缺失的值.但由于它们是重复测量,我还想应用如下规则:
- 如果他们错过了基线,但有期中考试:结转(即, 用中期取代基线);
- 如果他们错过了期中考试,但有期末考试:结转(即,取代 期中考试和期末考试)
- 如果他们错过了期末考试,但有期中考试:结转(即, 用期中考试取代期末考试)
- 如果他们错过了基线和最终结果,则结转和结转期中(即,将两者替换为期中考试).
我try 编写一个函数来实现这一目标,因为在我的真实数据集中,我有selfeff 1 -13.代码是这样的:
impute_values <- function(x, phase) {
# Carryback: Replace baseline with midterm if baseline is missing but midterm is available
if (phase == "Baseline" & is.na(x) & phase == "Midterm" & !is.na(x)) {
x <- na.locf(x)
}
# Carryback: Replace midterm with final if midterm is missing but final is available
# Carryforward: Replace final with midterm if final is missing but midterm is available
else if (phase == "Midterm" & is.na(x) & phase == "Final" & !is.na(x[3])) {
x <- na.locf(x)
} else if (phase == "Midterm" & !is.na(x) & phase == "Final" & is.na(x[3])) {
x <- na.locf(x, option="nocb")
}
# For the case where both baseline and final are missing but midterm is available,
# we can simply carry forward the missing values from midterm
else if (phase == "Baseline" & is.na(x) & phase == "Final" & is.na(x) &
phase == "Midterm" & !is.na(x)) {
x <- na.locf(x)
}
return(x)
}
但当我try 用一个变量测试这个函数时:比如selfeff 1,我使用代码:
df2 <- df %>%
mutate(selfeff1=impute_values(selfeff1, phase))
summary(is.na(df2$selfeff1)
我犯了一个错误,说:
error in if(```)NULL, the condition has length>1
有人可以帮助我展示如何修复它并使其适用于我的案件吗?