R 跨列查找多个时间报告

发布于02月28日

我正在try 判断一组受伤的数据.数据来自4个来源(医院、全科doctor 、self 报告、死亡)，每个来源都有受伤时间(以年为单位)(连续变量).一个人可能有一个或多个来源报告的伤害.我想知道医院的伤害是否有其他来源的报道(任何时间在0.25以内的伤害都被认为是相同的伤害).

因此，我想创建一个名为Hospital_Everywhere_1的列，其中，如果Hospital_1列中有时间，则Hospital_Everywhere_1列将显示"医院"，如果任何其他列(不包括医院)中的时间在0.25以内，则它还将包括一个文本，由来源的|分隔.

例如，如果在医院_1中受伤的年龄为65.44岁，在GP_1中的受伤年龄为65.42岁，在Self_Report中的受伤年龄为65.43岁，则应为"Hospital|GP|Self_Report".

我想对每个医院列都这样做，这样就会有一个Hospital_elsewhere_(i)

下面是一个示例数据集

library(tibble)
set.seed(123)

example_data <- tibble(
  id = 1:30,
  Hospital_1 = sample(c(NA, round(runif(25, 1, 80), 2)), 30, replace = TRUE),
  Hospital_2 = sample(c(NA, round(runif(25, 1, 80), 2)), 30, replace = TRUE),
  Hospital_3 = sample(c(NA, round(runif(25, 1, 80), 2)), 30, replace = TRUE),
  Hospital_4 = sample(c(NA, round(runif(25, 1, 80), 2)), 30, replace = TRUE),
  GP_1 = sample(c(NA, round(runif(25, 1, 80), 2)), 30, replace = TRUE),
  GP_2 = sample(c(NA, round(runif(25, 1, 80), 2)), 30, replace = TRUE),
  GP_3 = sample(c(NA, round(runif(25, 1, 80), 2)), 30, replace = TRUE),
  GP_4 = sample(c(NA, round(runif(25, 1, 80), 2)), 30, replace = TRUE),
  GP_5 = sample(c(NA, round(runif(25, 1, 80), 2)), 30, replace = TRUE),
  GP_6 = sample(c(NA, round(runif(25, 1, 80), 2)), 30, replace = TRUE),
  GP_7 = sample(c(NA, round(runif(25, 1, 80), 2)), 30, replace = TRUE),
  GP_8 = sample(c(NA, round(runif(25, 1, 80), 2)), 30, replace = TRUE),
  GP_9 = sample(c(NA, round(runif(25, 1, 80), 2)), 30, replace = TRUE),
  GP_10 = sample(c(NA, round(runif(25, 1, 80), 2)), 30, replace = TRUE),
  self_report_1 = sample(c(NA, round(runif(15, 1, 80), 2)), 30, replace = TRUE),
  self_report_2 = sample(c(NA, round(runif(15, 1, 80), 2)), 30, replace = TRUE),
  self_report_3 = sample(c(NA, round(runif(15, 1, 80), 2)), 30, replace = TRUE),
  self_report_4 = sample(c(NA, round(runif(15, 1, 80), 2)), 30, replace = TRUE),
  death_1 = sample(c(NA, round(runif(25, 1, 80), 2)), 30, replace = TRUE)
)

for (i in 1:10) {
  index <- sample(1:30, 1)  
  gp_value <- round(runif(1, 1, 80), 2)  
  example_data[index, paste0("GP_", 1:4)] <- gp_value 
  example_data[index, paste0("Hospital_", 1:4)] <- gp_value + runif(1, -0.25, 0.25)
}

library(dplyr) library(purrr) library(tidyr) (list_df <- example_data %>% pivot_longer(names_to = c("Type", "nr"), names_pattern = "(.*)_(\\d*)", cols = -id, names_transform = list(nr = as.integer)) %>% summarize(vals = list(value), .by = c(id, Type)) %>% pivot_wider(names_from = Type, values_from = vals) %>% relocate(GP, .after = Hospital)) # # A tibble: 30 × 5 # id Hospital GP self_report death # <int> <list> <list> <list> <list> # 1 1 <dbl [4]> <dbl [10]> <dbl [4]> <dbl [1]> # 2 2 <dbl [4]> <dbl [10]> <dbl [4]> <dbl [1]> # 3 3 <dbl [4]> <dbl [10]> <dbl [4]> <dbl [1]> # [...]

detect_reportings <- function(other, Hospital) { imap_chr(set_names(Hospital, paste0("Hospital_", seq_along(Hospital))), ~ if_else(any(abs(other - .x) <= .25), cur_column(), NA_character_)) %>% list() }

(dist_check <- list_df %>% group_by(id) %>% mutate(across(GP:death, ~ detect_reportings(.x[[1]], Hospital[[1]])), Hospital = set_names(if_else(is.na(Hospital[[1]]), NA_character_, "Hospital"), paste0("Hospital_", seq_along(Hospital[[1]]))) %>% list())) # # A tibble: 30 × 5 # # Groups: id [30] # id Hospital GP self_report death # <int> <list> <list> <list> <list> # 1 1 <chr [4]> <chr [4]> <chr [4]> <chr [4]> # 2 2 <chr [4]> <chr [4]> <chr [4]> <chr [4]> # 3 3 <chr [4]> <chr [4]> <chr [4]> <chr [4]> # [...] dist_check$GP[[16]] ## Hospital_3 and Hospital_4 were within the .25 range # Hospital_1 Hospital_2 Hospital_3 Hospital_4 # NA NA "GP" "GP" example_data[16, c(4:5, 8, 14)] # # A tibble: 1 × 4 # Hospital_3 Hospital_4 GP_3 GP_9 # <dbl> <dbl> <dbl> <dbl> # 1 25.6 19.9 25.4 19.7

merge_reportings <- function(...) { cbind(...) %>% as_tibble() %>% unite("result", sep = "|", na.rm = TRUE) %>% t() %>% c() %>% set_names(paste0("Hospital_elsewhere_", seq_along(..1))) %>% as.list() %>% as_tibble() } res <- dist_check %>% reframe(merge_reportings(Hospital[[1]], GP[[1]], self_report[[1]], death[[1]])) print(res, n = 10L) # # A tibble: 30 × 5 # id Hospital_elsewhere_1 Hospital_elsewhere_2 Hospital_elsewhere_3 Hospital_elsewhere_4 # <int> <chr> <chr> <chr> <chr> # 1 1 Hospital Hospital Hospital "Hospital" # 2 2 Hospital Hospital Hospital "Hospital" # 3 3 Hospital Hospital Hospital "Hospital" # 4 4 Hospital Hospital Hospital "Hospital" # 5 5 Hospital Hospital Hospital "Hospital" # 6 6 Hospital Hospital Hospital|GP "Hospital|GP" # 7 7 Hospital|GP Hospital|GP Hospital|GP "Hospital|GP" # 8 8 Hospital|GP Hospital|GP Hospital|GP "Hospital|GP" # 9 9 Hospital Hospital Hospital "Hospital" # 10 10 Hospital Hospital Hospital "Hospital" # # ℹ 20 more rows # # ℹ Use `print(n = ...)` to see more rows

R 跨列查找多个时间报告

推荐答案

R相关问答推荐

使用rlang s arg_match判断函数输入列表

在数据表中呈现数学符号

在通过最大似然估计将ODE模型与数据匹配时，为什么要匹配实际参数的转换值？

如何使用`ggplot2：：geom_segment()`或`ggspatial：：geom_spatial_segment()`来处理不在格林威治中心的sf对象？

基于不同组的列的相关性

如何在R中添加截止点到ROC曲线图？

在某些栏和某些条件下，替换dfs列表中的NA

如何通过Docker部署我的shiny 应用程序(多个文件)

如何在一次运行中使用count进行多列计数

计算时间段的ECDF(R)

R函数‘paste`正在颠倒其参数的顺序

为了网络分析目的，将数据框转换为长格式列联表

TreeNode打印 twig 并为其上色

ggplot R：X，Y，Z使用固定/等距的X，Y坐标绘制六边形热图

按组和连续id计算日期差

Broom.Mixed：：Augment不适用于Sample：：分析

如何在不使用SHINY的情况下将下拉滤镜列表添加到ggploy？

使用显式二元谓词子集化sfc对象时出错

动态统计函数在ShinyApp内部更改

从字符串列中的向量中查找第一个匹配的单词