R 使用na.locf在长格式数据集中输入具有多个时间点的数据集

发布于04月24日

我有一个这样的数据集:

structure(list(study_id = structure(c("P005", "P005", "P005",
"P008", "P008", "P008", "P021", "P021", "P021", "P028", "P028",
"P028", "P032", "P032", "P032", "P036", "P036", "P036", "P037",
"P037", "P037", "P049", "P049", "P049", "P053", "P053", "P053",
"P069", "P069", "P069", "P079", "P079", "P079", "P089", "P089",
"P089", "P093", "P093", "P093", "P096", "P096", "P096", "P104",
"P104", "P104", "P105", "P105", "P105"), label = "ISMART Study ID", format.stata = "%9s"),
    phase = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L,
    2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), levels = c("Baseline", "Midterm",
    "Final"), class = "factor"), selfeff1 = structure(c(3L, 3L,
    3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
    3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, NA, 3L,
    3L, 3L, 3L, 3L, 3L, NA, 3L, 3L, 3L, 3L, 3L, 3L, 3L, NA, 3L,
    2L), levels = c("Not confident", "Somewhat confident", "Very confident"
    ), class = "factor"), selfeff3 = structure(c(3L, 3L, 3L,
    3L, 3L, 3L, 3L, 3L, 3L, 2L, 3L, 3L, 2L, 3L, 2L, 3L, 3L, 3L,
    3L, 3L, NA, 3L, 2L, 3L, 2L, 3L, 3L, 3L, 3L, 3L, NA, 3L, 3L,
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L
    ), levels = c("Not confident", "Somewhat confident", "Very confident"
    ), class = "factor")), class = "data.frame", row.names = c(NA,
-48L))

这是预算长格式数据集.每个study_id有三行基线、中期和最终值.现在我想使用结转/结转方法来估算缺失的值.但由于它们是重复测量，我还想应用如下规则:

如果他们错过了基线，但有期中考试:结转(即，用中期取代基线);
如果他们错过了期中考试，但有期末考试:结转(即，取代期中考试和期末考试)
如果他们错过了期末考试，但有期中考试:结转(即，用期中考试取代期末考试)
如果他们错过了基线和最终结果，则结转和结转期中(即，将两者替换为期中考试).

我try 编写一个函数来实现这一目标，因为在我的真实数据集中，我有selfeff 1 -13.代码是这样的:

impute_values <- function(x, phase) {
  # Carryback: Replace baseline with midterm if baseline is missing but midterm is available
  if (phase == "Baseline" & is.na(x) & phase == "Midterm" & !is.na(x)) {
    x <- na.locf(x)
  }
  # Carryback: Replace midterm with final if midterm is missing but final is available
  # Carryforward: Replace final with midterm if final is missing but midterm is available
  else if (phase == "Midterm" & is.na(x) & phase == "Final" & !is.na(x[3])) {
    x <- na.locf(x)
  } else if (phase == "Midterm" & !is.na(x) & phase == "Final" & is.na(x[3])) {
    x <- na.locf(x, option="nocb")
  }
  # For the case where both baseline and final are missing but midterm is available, 
  # we can simply carry forward the missing values from midterm
  else if (phase == "Baseline" & is.na(x) & phase == "Final" & is.na(x) & 
           phase == "Midterm" & !is.na(x)) {
    x <- na.locf(x)
  }
  return(x)
}

但当我try 用一个变量测试这个函数时:比如selfeff 1，我使用代码:

df2 <- df %>%
  mutate(selfeff1=impute_values(selfeff1, phase))

summary(is.na(df2$selfeff1)

我犯了一个错误，说:

error in if(```)NULL,  the condition has length>1

有人可以帮助我展示如何修复它并使其适用于我的案件吗？

> prepl <- \(x) { + p <- paste(+is.na(x), collapse='') + if (grepl('10.', p)) { + x[1] <- x[2] + x + } else if (grepl('.10', p)) { + x[2] <- x[3] + x + } else if (grepl('.01', p)) { + x[3] <- x[2] + x + } else if (grepl('1.1', p)) { + x[c(1, 3)] <- x[2] + x + } else { + x + } + }

> icl <- grep('^selfeff\\d+$', names(df)) > df[icl] <- lapply(df[icl], \(x) by(x, df$study_id, prepl) |> unsplit(df$study_id)) > df study_id phase selfeff1 selfeff3 1 P005 Baseline Very confident Very confident 2 P005 Midterm Very confident Very confident 3 P005 Final Very confident Very confident 4 P008 Baseline Very confident Very confident 5 P008 Midterm Very confident Very confident 6 P008 Final Very confident Very confident 7 P021 Baseline Somewhat confident Very confident 8 P021 Midterm Very confident Very confident 9 P021 Final Very confident Very confident 10 P028 Baseline Somewhat confident Somewhat confident 11 P028 Midterm Very confident Very confident 12 P028 Final Very confident Very confident 13 P032 Baseline Very confident Somewhat confident 14 P032 Midterm Very confident Very confident 15 P032 Final Very confident Somewhat confident 16 P036 Baseline Very confident Very confident 17 P036 Midterm Very confident Very confident 18 P036 Final Very confident Very confident 19 P037 Baseline Very confident Very confident 20 P037 Midterm Very confident Very confident 21 P037 Final Very confident Very confident 22 P049 Baseline Very confident Very confident 23 P049 Midterm Somewhat confident Somewhat confident 24 P049 Final Very confident Very confident 25 P053 Baseline Very confident Somewhat confident 26 P053 Midterm Very confident Very confident 27 P053 Final Very confident Very confident 28 P069 Baseline Very confident Very confident 29 P069 Midterm Very confident Very confident 30 P069 Final Very confident Very confident 31 P079 Baseline Very confident Very confident 32 P079 Midterm Very confident Very confident 33 P079 Final Very confident Very confident 34 P089 Baseline Very confident Very confident 35 P089 Midterm Very confident Very confident 36 P089 Final Very confident Very confident 37 P093 Baseline Very confident Very confident 38 P093 Midterm Very confident Very confident 39 P093 Final Very confident Very confident 40 P096 Baseline Very confident Very confident 41 P096 Midterm Very confident Very confident 42 P096 Final Very confident Very confident 43 P104 Baseline Very confident Very confident 44 P104 Midterm Very confident Very confident 45 P104 Final Very confident Very confident 46 P105 Baseline Very confident Very confident 47 P105 Midterm Very confident Very confident 48 P105 Final Somewhat confident Somewhat confident

R 使用na.locf在长格式数据集中输入具有多个时间点的数据集

推荐答案

R相关问答推荐

具有相同条宽 * 和 ** 在 * 多个 * 图上相同条距的条图

为什么predicate.lm给出的是一个长度与我解析的数据集不同的载体？

R gtsummary tBL_summary，包含分层和两个独立分组变量

使用ggplot 2根据R中的类别排列Likert比例gplot

如何在ggplot 2线性图的每个方面显示每个组的误差条？

工作流程_set带有Dplyrr风格的 Select 器，用于 Select 结果和预测因子R

如何使用R中的dhrr函数将李克特量表的因子列从长转换为宽？

整数成随机顺序与约束R？

如何将旋转后的NetCDF转换回正常的纬度/经度网格，并使用R？

根据日期从参考帧中创建不同的帧

用两种 colored颜色填充方框图

如何将网站图像添加到带有极坐标的面包裹条形图？

仅在R中的数据集开始和结束时删除所有 Select 列的具有NA的行

自定义gggraph，使geom_abline图层仅在沿x轴的特定范围内显示

如何在PDF格式的kableExtra表格中显示管道字符？

Geom_arcbar()中出错：找不到函数"；geom_arcbar"；

在R中的数据框上使用Apply()函数时，如何保留非数字列？

计算使一组输入值最小化的a、b和c的值

替换在以前工作的代码中有x行&q；错误(geom_sf/gganimate/dow_mark)

R-如何在ggplot2中显示具有不同x轴值(日期)的多行？