我试图通过按x分组来减少数据帧中的记录数,然后根据组中列y和z的值有条件地执行过滤.

以下是目前为止我所掌握的:

# Required packages
library(tidyverse)

# Data Filtering
job_history <- data.frame(x = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4),
                          
                          y = c("Hire", "Data Change", "Leave of Absence", "Hire", "Termination", "Hire", "Termination", "Rehire", "Transfer", "Hire", "Termination", "Rehire", "Termination"),
                          
                          z = as.Date(c("2024-01-01", "2024-02-01", "2024-03-01", "2024-01-01", "2024-02-01", "2024-01-01", "2024-02-01", "2024-03-01", "2024-04-01", "2024-01-01", "2024-02-01", "2024-03-01", "2024-04-01")))

# Group_by and conditionally filter
job_history %>% 
  
  group_by(x) %>% 
  
# Count the number of hire, termination, and rehire records
  mutate(hires   = sum(y == "Hire"),
         terms   = sum(y == "Termination"),
         rehires = sum(y == "Rehire")) %>% 
  
  case_when(
# If there is 1 hire record and no terms or rehires, filter for the latest record
    (hires = 1 & terms = 0 & rehires = 0) ~ filter(z = max(z)),
# If there is 1 hire and 1 term record, filter for the termination record
    (hires = 1 & terms = 1 & rehires = 0) ~ filter(y == "Termination"),
# If there is 1 of each record, filter for the term record and the latest record
    (hires = 1 & terms = 1 & rehires = 1) ~ filter(y == "Termination") | z = max(z)),
# If there is 1 hire, 2 term, and 1 rehire record, filter for both termination records
    (hires = 1 & terms = 2 & rehires = 1) ~ filter(y == "Termination")
  )

期望的输出如下:

x       y                   z
1       Leave of Absence    2024-03-01
2       Termination         2024-02-01
3       Termination         2024-02-01
3       Transfer            2024-04-01
4       Termination         2024-02-01
4       Termination         2024-04-01

推荐答案

欢迎来到SO.如果您描述您的过滤条件,您将得到更好的帮助.

试试这个:

# For each "x", keep the terminations or the latest event
filter(job_history, .by = x, y == "Termination" | z == max(z))
# A tibble: 6 × 3
      x y                z         
  <dbl> <chr>            <date>    
1     1 Leave of Absence 2024-03-01
2     2 Termination      2024-02-01
3     3 Termination      2024-02-01
4     3 Transfer         2024-04-01
5     4 Termination      2024-02-01
6     4 Termination      2024-04-01

替代解决方案

一百!但更接近行动最初的推理定义组,取你想要的每个组,并绑定在一起.

# Create groups from different parameters
aux <- job_history %>% 
  count(x, y) %>% 
  pivot_wider(names_from = y, values_from = n, values_fill = 0) %>% 
  transmute(
    x, 
    group_id = case_when(
      Hire == 1 & Termination == 0 & Rehire == 0 ~ "a",
      Hire == 1 & Termination == 1 & Rehire == 0 ~ "b",
      Hire == 1 & Termination == 1 & Rehire == 1 ~ "c",
      Hire == 1 & Termination == 2 & Rehire == 1 ~ "d",
      TRUE ~ "other"))

#
job_output <- bind_rows(
  # Group "a"
  job_history %>% 
    inner_join(filter(aux, group_id == "a"), by = "x") %>% 
    filter(z == max(z)), 
  
  # GRoup "b"
  job_history %>% 
    inner_join(filter(aux, group_id == "b"), by = "x") %>% 
    filter(y == "Termination"),
  
  # Group "c"
  job_history %>% 
    inner_join(filter(aux, group_id == "c"), by = "x") %>% 
    filter(y == "Termination" | z == max(z)), 
  
  # Group "d" 
  job_history %>% 
    inner_join(filter(aux, group_id == "d"), by = "x") %>% 
    filter(y == "Termination"))

> job_output
# A tibble: 6 × 4
      x y                z          group_id
  <dbl> <chr>            <date>     <chr>   
1     1 Leave of Absence 2024-03-01 a       
2     2 Termination      2024-02-01 b       
3     3 Termination      2024-02-01 c       
4     3 Transfer         2024-04-01 c       
5     4 Termination      2024-02-01 d       
6     4 Termination      2024-04-01 d 

R相关问答推荐

在值和NA的行顺序中寻找中断模式

咕噜中的元素列表:map

为什么观察不会被无功值变化触发?

如何同时从多个列表中获取名字?

如何读取CSV的特定列时,给定标题作为向量

`lazy_dt`不支持`dplyr/across`?

如何在ggplot2中创建多个y轴(每个变量一个)

在点图上绘制置信度或预测区间ggplot2

如何在反曲线图中更改X标签

如果条件匹配,则使用Mariate粘贴列名

数据集上的R循环和存储模型系数

在ggploy中创建GeV分布时出错

注释不会绘制在所有ggplot2面上

R try Catch in the loop-跳过缺少的值并创建一个DF,显示跳过的内容

从矩阵创建系数图

R/shiny APP:如何充分利用窗口?

如何计算多个变量的百分比与总和的百分比?

汇总数据:在跨越()all_of()Dynamic_list_of_vars=>;所选内容不能有缺失值的汇总()中出错

如何修改Rust中的R字符串并将其赋给新的R变量,并使用extendr保留原始R字符串

如何在R曲线图弹出窗口中更改r和theta标签