R 然后根据不同的列值有条件地执行函数

发布于04月03日

我试图通过按x分组来减少数据帧中的记录数，然后根据组中列y和z的值有条件地执行过滤.

以下是目前为止我所掌握的:

# Required packages
library(tidyverse)

# Data Filtering
job_history <- data.frame(x = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4),
                          
                          y = c("Hire", "Data Change", "Leave of Absence", "Hire", "Termination", "Hire", "Termination", "Rehire", "Transfer", "Hire", "Termination", "Rehire", "Termination"),
                          
                          z = as.Date(c("2024-01-01", "2024-02-01", "2024-03-01", "2024-01-01", "2024-02-01", "2024-01-01", "2024-02-01", "2024-03-01", "2024-04-01", "2024-01-01", "2024-02-01", "2024-03-01", "2024-04-01")))

# Group_by and conditionally filter
job_history %>% 
  
  group_by(x) %>% 
  
# Count the number of hire, termination, and rehire records
  mutate(hires   = sum(y == "Hire"),
         terms   = sum(y == "Termination"),
         rehires = sum(y == "Rehire")) %>% 
  
  case_when(
# If there is 1 hire record and no terms or rehires, filter for the latest record
    (hires = 1 & terms = 0 & rehires = 0) ~ filter(z = max(z)),
# If there is 1 hire and 1 term record, filter for the termination record
    (hires = 1 & terms = 1 & rehires = 0) ~ filter(y == "Termination"),
# If there is 1 of each record, filter for the term record and the latest record
    (hires = 1 & terms = 1 & rehires = 1) ~ filter(y == "Termination") | z = max(z)),
# If there is 1 hire, 2 term, and 1 rehire record, filter for both termination records
    (hires = 1 & terms = 2 & rehires = 1) ~ filter(y == "Termination")
  )

期望的输出如下:

x       y                   z
1       Leave of Absence    2024-03-01
2       Termination         2024-02-01
3       Termination         2024-02-01
3       Transfer            2024-04-01
4       Termination         2024-02-01
4       Termination         2024-04-01

# For each "x", keep the terminations or the latest event filter(job_history, .by = x, y == "Termination" | z == max(z)) # A tibble: 6 × 3 x y z <dbl> <chr> <date> 1 1 Leave of Absence 2024-03-01 2 2 Termination 2024-02-01 3 3 Termination 2024-02-01 4 3 Transfer 2024-04-01 5 4 Termination 2024-02-01 6 4 Termination 2024-04-01

# Create groups from different parameters aux <- job_history %>% count(x, y) %>% pivot_wider(names_from = y, values_from = n, values_fill = 0) %>% transmute( x, group_id = case_when( Hire == 1 & Termination == 0 & Rehire == 0 ~ "a", Hire == 1 & Termination == 1 & Rehire == 0 ~ "b", Hire == 1 & Termination == 1 & Rehire == 1 ~ "c", Hire == 1 & Termination == 2 & Rehire == 1 ~ "d", TRUE ~ "other")) # job_output <- bind_rows( # Group "a" job_history %>% inner_join(filter(aux, group_id == "a"), by = "x") %>% filter(z == max(z)), # GRoup "b" job_history %>% inner_join(filter(aux, group_id == "b"), by = "x") %>% filter(y == "Termination"), # Group "c" job_history %>% inner_join(filter(aux, group_id == "c"), by = "x") %>% filter(y == "Termination" | z == max(z)), # Group "d" job_history %>% inner_join(filter(aux, group_id == "d"), by = "x") %>% filter(y == "Termination")) > job_output # A tibble: 6 × 4 x y z group_id <dbl> <chr> <date> <chr> 1 1 Leave of Absence 2024-03-01 a 2 2 Termination 2024-02-01 b 3 3 Termination 2024-02-01 c 4 3 Transfer 2024-04-01 c 5 4 Termination 2024-02-01 d 6 4 Termination 2024-04-01 d

R 然后根据不同的列值有条件地执行函数

推荐答案

替代解决方案

R相关问答推荐

在值和NA的行顺序中寻找中断模式

咕噜中的元素列表：map

为什么观察不会被无功值变化触发？

如何同时从多个列表中获取名字？

如何读取CSV的特定列时，给定标题作为向量

`lazy_dt`不支持`dplyr/across`？

如何在ggplot2中创建多个y轴(每个变量一个)

在点图上绘制置信度或预测区间ggplot2

如何在反曲线图中更改X标签

如果条件匹配，则使用Mariate粘贴列名

数据集上的R循环和存储模型系数

在ggploy中创建GeV分布时出错

注释不会绘制在所有ggplot2面上

R try Catch in the loop-跳过缺少的值并创建一个DF，显示跳过的内容

从矩阵创建系数图

R/shiny APP：如何充分利用窗口？

如何计算多个变量的百分比与总和的百分比？

汇总数据：在跨越()all_of()Dynamic_list_of_vars=>；所选内容不能有缺失值的汇总()中出错

如何修改Rust中的R字符串并将其赋给新的R变量，并使用extendr保留原始R字符串

如何在R曲线图弹出窗口中更改r和theta标签