以下是我的观点:

structure(list(date = c(1990, 1991, 1992, 1990, 1991, 1992, 1990, 
1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 
1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 
1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 
1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 
1990, 1991, 1992), member1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2), member2 = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), active1 = c(1, 
1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 
1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 
1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1), active2 = c(0, 1, 1, 0, 1, 
1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 
1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 
1, 0, 1, 1, 0, 0, 1), group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 
2, 2), task = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 
2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 
3, 1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3)), class = "data.frame", row.names = c(NA, -54L))

以下是我想要的输出:

structure(list(date = c(1990L, 1991L, 1992L, 1990L, 1991L, 1992L, 
1990L, 1991L, 1992L, 1990L, 1991L, 1992L, 1990L, 1991L, 1992L, 
1990L, 1991L, 1992L, 1990L, 1991L, 1992L, 1990L, 1991L, 1992L, 
1990L, 1991L, 1992L, 1990L, 1991L, 1992L, 1990L, 1991L, 1992L, 
1990L, 1991L, 1992L, 1990L, 1991L, 1992L, 1990L, 1991L, 1992L, 
1990L, 1991L, 1992L, 1990L, 1991L, 1992L, 1990L, 1991L, 1992L, 
1990L, 1991L, 1992L), member1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
    member2 = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), active1 = c(1L, 
    1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 
    0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 
    1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 
    1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L), active2 = c(0L, 1L, 1L, 
    0L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 
    0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 
    0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 
    0L, 1L, 1L, 0L, 0L, 1L), group = c(1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L), task = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 
    1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 
    3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 
    2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L
    ), dummy1 = c(0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 
    1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 
    0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 
    0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L), dummy2 = c(0L, 
    0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, -54L))

100 I want to create two dummy variables, defined matehmatically as follows. I give a more verbal description below..

enter image description here

一种错综复杂的过程.有几组任务,每个任务由变量task中的其标识符给出,并且属于变量group中详细描述的任务组.从本质上讲,我想创建一个虚拟变量,该变量依赖于(member1, member2)是否遵循给定年份的任务in order.

例如,将(member1, member2) == (1, 2)(dput中的前九行)取为group == 1.对于这两个成员,在任务组(group == 1)中有3个任务(task == {1, 2, 3}).active1active2分别描述member1member2在给定yeargroup中是否正在进行相应的task.对于year == 1990个成员中的task == 1个,只有member1个成员在积极地完成任务(active1 == 1, active2 == 0),但在接下来的两年里,两个成员都在积极地完成任务(active1 == 1, active2 == 1),以此类推.

对于每个组中的每项任务、每对成员以及每年,我希望生成两个虚拟变量:

1.)如果a)任务是组中的第一个任务(即,具有最小数目的任务)并且member1member2都具有active == 1或b),则一个等于1的伪变量).例如,对于member1, member2group == 1,task == 1的所有年份的伪变量==1,task == 2year == 1991的伪变量==1,task == 3year == 1991的伪变量==1.在后两个任务的其他年份,此伪变量将等于0.

2.)我要创建的第二个伪变量基本上与第一个变量相反.如果任务是a,我希望它等于1.)不是组中的第一个任务,但member1member2都有active == 1,以及b)紧挨着它之前的任务没有被主动执行,否则哑元等于零.例如,对于member1, member2group == 1,这个伪变量将在所有年份的task == 1中==0,在所有年份中的task == 2==0,在1992年的task == 3中==1;在后一种情况下,它将等于1,因为member1, member2在1992年有效地执行task == 3,但在该年不执行task == 2.

PLEASE NOTE在我的实际数据库中任务没有被订购(1,2,3...).任务跳过数字(例如,(10.0, 10.07, 11.0...)),但仍然是有序的,因此解决方案要么a)必须避免使用i, i+1命名法,要么首先必须将我的任务变量转换为i, i+1格式.

THANK YOU FOR ANY HELP IN ADVANCE!

UPDATE:我得到了帮助,被接受的答案对dput非常有效-我已经为我的实际数据集编辑了他们的代码,这可能会帮助也可能不会帮助future 的答题者:

  df1 <- df1 %>%
    mutate(lagtask=dplyr::lag(x=task, n = 1, order_by=grouping),
           lagact1=dplyr::lag(x=active1,1, order_by=grouping),
           lagact2=dplyr::lag(x=active2,1, order_by=grouping)) %>% 
    mutate(lagact1 = ifelse(is.na(lagact1), active1, lagact1),
           lagact2 = ifelse(is.na(lagact2), active2, lagact2)) %>%
      mutate(dummy1=ifelse(active1 == 1 & active2 == 1 &
                             # lag1 == lag(x=Sequence, 1) &
                             (lagact1 == 1 &
                             lagact2 ==1),
                           1,0
      )) %>%
    mutate(dummy2=ifelse(active1 == 1 & active2 == 1 &
                           # lag1 == lag(x=task, 1) &
                           (lagact1 != 1 |
                           lagact2 != 1),
                         1,0
    ))
  

推荐答案

我有点困惑,因为OP规定task 1dummy2应该是0,group 1的所有年份都应该是task 2,除非在1992年task3dummy2应该等于1.然而,预期yields 显示,1991年dummy2只相当于task 2只有1只.

desired_output[1:9,]

按照问题的语言,而不是desired_output对象中的值(抄袭自OP的帖子),我认为下面的方法是有效的.

output <- df %>%
  group_by(date, group) %>%
  mutate(dummy1 = ifelse(task == first(task) &
                           active1 == 1 &
                           active2 == 1,
                         1, 0)) %>%
  mutate(dummy1 = ifelse(active1 == 1 &
                           active2 == 1 &
                           lag(active1 == 1,
                               default = 1) &
                           lag(active2 == 1,
                               default = 1),
                         1, dummy1)) %>%
  mutate(dummy2 = ifelse(task != first(task) &
                           active1 == 1 &
                           active2 == 1,
                         1, 0)) %>%
  mutate(lagtask=lag(x=task, n = 1),
         lagact1=lag(x=active1,1),
         lagact2=lag(x=active2,1)) %>% 
  mutate(dummy2=ifelse(dummy2 == 1 &
                         # lag1 == lag(x=task, 1) &
                         lagact1 == 1 &
                         lagact2 ==1,
                       0,dummy2
  )) %>%
  ungroup() %>%
  as.data.frame() %>% 
  dplyr::select(!c(lagtask,lagact1,lagact2))

R相关问答推荐

如何从其他前面列中减go 特定列的平均值?

如何将在HW上运行的R中的消息(错误、警告等)作为批处理任务输出

ggplot geom_smooth()用于线性回归虚拟变量-没有回归线

使用gggrassure减少地块之间的空间

使用across,starts_with和ifelse语句变更多个变量

我如何才能找到FAMILY=POISSON(LINK=&Q;LOG&Q;)中的模型预测指定值的日期?

过滤名称以特定字符串开头的文件

如何将使用rhandsontable呈现的表值格式化为百分比,同时保留并显示完整的小数精度?

安全地测试文件是否通过R打开

如何根据R中其他变量的类别汇总值?

查找所有站点的最小值

自定义gggraph,使geom_abline图层仅在沿x轴的特定范围内显示

如何在R中改变fviz_pca_biplot中圆的边界线的 colored颜色 ?

当每个变量值只能 Select 一次时,如何从数据框中 Select 两个变量的组合?

我如何使用循环来编写冗余的Rmarkdown脚本?

避免在图例中显示VLINS组

随机 Select 的非NA列的行均数

带有Bootswatch Cerulean主题的shiny 仪表板中的浏览&按钮可见性问题

R:水平旋转图

GgHighlight找不到它创建的列:`Highlight..1`->;`Highlight.....`