我有一个这样的数据集:

structure(list(study_id = structure(c("P005", "P005", "P005",
"P008", "P008", "P008", "P021", "P021", "P021", "P028", "P028",
"P028", "P032", "P032", "P032", "P036", "P036", "P036", "P037",
"P037", "P037", "P049", "P049", "P049", "P053", "P053", "P053",
"P069", "P069", "P069", "P079", "P079", "P079", "P089", "P089",
"P089", "P093", "P093", "P093", "P096", "P096", "P096", "P104",
"P104", "P104", "P105", "P105", "P105"), label = "ISMART Study ID", format.stata = "%9s"),
    phase = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("Baseline", "Midterm",
    "Final"), class = "factor"), selfeff1 = structure(c(3L, 3L,
    3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
    3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, NA, 3L,
    3L, 3L, 3L, 3L, 3L, NA, 3L, 3L, 3L, 3L, 3L, 3L, 3L, NA, 3L,
    2L), levels = c("Not confident", "Somewhat confident", "Very confident"
    ), class = "factor"), selfeff3 = structure(c(3L, 3L, 3L,
    3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 2L, 3L, 3L, 3L,
    3L, 3L, NA, 3L, 2L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, NA, 3L, 3L,
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L
    ), levels = c("Not confident", "Somewhat confident", "Very confident"
    ), class = "factor")), class = "data.frame", row.names = c(NA,
-48L))

这是预算长格式数据集.每个study_id有三行基线、中期和最终值.现在我想使用结转/结转方法来估算缺失的值.但由于它们是重复测量,我还想应用如下规则:

  • 如果他们错过了基线,但有期中考试:结转(即, 用中期取代基线);
  • 如果他们错过了期中考试,但有期末考试:结转(即,取代 期中考试和期末考试)
  • 如果他们错过了期末考试,但有期中考试:结转(即, 用期中考试取代期末考试)
  • 如果他们错过了基线和最终结果,则结转和结转期中(即,将两者替换为期中考试).

我try 编写一个函数来实现这一目标,因为在我的真实数据集中,我有selfeff 1 -13.代码是这样的:

impute_values <- function(x, phase) {
  # Carryback: Replace baseline with midterm if baseline is missing but midterm is available
  if (phase == "Baseline" & is.na(x) & phase == "Midterm" & !is.na(x)) {
    x <- na.locf(x)
  }
  # Carryback: Replace midterm with final if midterm is missing but final is available
  # Carryforward: Replace final with midterm if final is missing but midterm is available
  else if (phase == "Midterm" & is.na(x) & phase == "Final" & !is.na(x[3])) {
    x <- na.locf(x)
  } else if (phase == "Midterm" & !is.na(x) & phase == "Final" & is.na(x[3])) {
    x <- na.locf(x, option="nocb")
  }
  # For the case where both baseline and final are missing but midterm is available, 
  # we can simply carry forward the missing values from midterm
  else if (phase == "Baseline" & is.na(x) & phase == "Final" & is.na(x) & 
           phase == "Midterm" & !is.na(x)) {
    x <- na.locf(x)
  }
  return(x)
}

但当我try 用一个变量测试这个函数时:比如selfeff 1,我使用代码:

df2 <- df %>%
  mutate(selfeff1=impute_values(selfeff1, phase))

summary(is.na(df2$selfeff1)

我犯了一个错误,说:

error in if(```)NULL,  the condition has length>1

有人可以帮助我展示如何修复它并使其适用于我的案件吗?

推荐答案

您可以使用函数prepl,该函数pasteis.na struct 转换为二进制模式,例如selfeff 3中的study_id P037"001".因此,您可以使用by中的grep(您可以想象为splitlapply的组合),然后是unsplit,轻松地对每个selfeff* 列中的每种情况应用替换逻辑.这让正在发生的事情一目了然,并且可以根据需要进行扩展.

> prepl <- \(x) {
+   p <- paste(+is.na(x), collapse='')
+   if (grepl('10.', p)) {
+     x[1] <- x[2]
+     x
+   } else if (grepl('.10', p)) {
+     x[2] <- x[3]
+     x
+   } else if (grepl('.01', p)) {
+     x[3] <- x[2]
+     x
+   } else if (grepl('1.1', p)) {
+     x[c(1, 3)] <- x[2]
+     x
+   } else {
+     x
+   }
+ }

> icl <- grep('^selfeff\\d+$', names(df))
> df[icl] <- lapply(df[icl], \(x) by(x, df$study_id, prepl) |> unsplit(df$study_id))
> df
   study_id    phase           selfeff1           selfeff3
1      P005 Baseline     Very confident     Very confident
2      P005  Midterm     Very confident     Very confident
3      P005    Final     Very confident     Very confident
4      P008 Baseline     Very confident     Very confident
5      P008  Midterm     Very confident     Very confident
6      P008    Final     Very confident     Very confident
7      P021 Baseline Somewhat confident     Very confident
8      P021  Midterm     Very confident     Very confident
9      P021    Final     Very confident     Very confident
10     P028 Baseline Somewhat confident Somewhat confident
11     P028  Midterm     Very confident     Very confident
12     P028    Final     Very confident     Very confident
13     P032 Baseline     Very confident Somewhat confident
14     P032  Midterm     Very confident     Very confident
15     P032    Final     Very confident Somewhat confident
16     P036 Baseline     Very confident     Very confident
17     P036  Midterm     Very confident     Very confident
18     P036    Final     Very confident     Very confident
19     P037 Baseline     Very confident     Very confident
20     P037  Midterm     Very confident     Very confident
21     P037    Final     Very confident     Very confident
22     P049 Baseline     Very confident     Very confident
23     P049  Midterm Somewhat confident Somewhat confident
24     P049    Final     Very confident     Very confident
25     P053 Baseline     Very confident Somewhat confident
26     P053  Midterm     Very confident     Very confident
27     P053    Final     Very confident     Very confident
28     P069 Baseline     Very confident     Very confident
29     P069  Midterm     Very confident     Very confident
30     P069    Final     Very confident     Very confident
31     P079 Baseline     Very confident     Very confident
32     P079  Midterm     Very confident     Very confident
33     P079    Final     Very confident     Very confident
34     P089 Baseline     Very confident     Very confident
35     P089  Midterm     Very confident     Very confident
36     P089    Final     Very confident     Very confident
37     P093 Baseline     Very confident     Very confident
38     P093  Midterm     Very confident     Very confident
39     P093    Final     Very confident     Very confident
40     P096 Baseline     Very confident     Very confident
41     P096  Midterm     Very confident     Very confident
42     P096    Final     Very confident     Very confident
43     P104 Baseline     Very confident     Very confident
44     P104  Midterm     Very confident     Very confident
45     P104    Final     Very confident     Very confident
46     P105 Baseline     Very confident     Very confident
47     P105  Midterm     Very confident     Very confident
48     P105    Final Somewhat confident Somewhat confident

R相关问答推荐

具有相同条宽 * 和 ** 在 * 多个 * 图上相同条距的条图

为什么predicate.lm给出的是一个长度与我解析的数据集不同的载体?

R gtsummary tBL_summary,包含分层和两个独立分组变量

使用ggplot 2根据R中的类别排列Likert比例gplot

如何在ggplot 2线性图的每个方面显示每个组的误差条?

工作流程_set带有Dplyrr风格的 Select 器,用于 Select 结果和预测因子R

如何使用R中的dhrr函数将李克特量表的因子列从长转换为宽?

整数成随机顺序与约束R?

如何将旋转后的NetCDF转换回正常的纬度/经度网格,并使用R?

根据日期从参考帧中创建不同的帧

用两种 colored颜色 填充方框图

如何将网站图像添加到带有极坐标的面包裹条形图?

仅在R中的数据集开始和结束时删除所有 Select 列的具有NA的行

自定义gggraph,使geom_abline图层仅在沿x轴的特定范围内显示

如何在PDF格式的kableExtra表格中显示管道字符?

Geom_arcbar()中出错:找不到函数";geom_arcbar";

在R中的数据框上使用Apply()函数时,如何保留非数字列?

计算使一组输入值最小化的a、b和c的值

替换在以前工作的代码中有x行&q;错误(geom_sf/gganimate/dow_mark)

R-如何在ggplot2中显示具有不同x轴值(日期)的多行?