以下是我的观点:
structure(list(date = c(1990, 1991, 1992, 1990, 1991, 1992, 1990,
1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992,
1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991,
1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990,
1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992, 1990, 1991, 1992,
1990, 1991, 1992), member1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2), member2 = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3), active1 = c(1,
1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0,
1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1), active2 = c(0, 1, 1, 0, 1,
1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0,
1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1,
1, 0, 1, 1, 0, 0, 1), group = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2,
2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2,
2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2,
2, 2), task = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2,
2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3,
3, 1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3)), class = "data.frame", row.names = c(NA, -54L))
以下是我想要的输出:
structure(list(date = c(1990L, 1991L, 1992L, 1990L, 1991L, 1992L,
1990L, 1991L, 1992L, 1990L, 1991L, 1992L, 1990L, 1991L, 1992L,
1990L, 1991L, 1992L, 1990L, 1991L, 1992L, 1990L, 1991L, 1992L,
1990L, 1991L, 1992L, 1990L, 1991L, 1992L, 1990L, 1991L, 1992L,
1990L, 1991L, 1992L, 1990L, 1991L, 1992L, 1990L, 1991L, 1992L,
1990L, 1991L, 1992L, 1990L, 1991L, 1992L, 1990L, 1991L, 1992L,
1990L, 1991L, 1992L), member1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L),
member2 = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), active1 = c(1L,
1L, 1L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
0L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L,
1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L), active2 = c(0L, 1L, 1L,
0L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L,
0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L,
0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L,
0L, 1L, 1L, 0L, 0L, 1L), group = c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L), task = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L,
1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L,
3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L,
2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L
), dummy1 = c(0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 0L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L,
0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L, 1L), dummy2 = c(0L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA, -54L))
100 I want to create two dummy variables, defined matehmatically as follows. I give a more verbal description below..
一种错综复杂的过程.有几组任务,每个任务由变量task
中的其标识符给出,并且属于变量group
中详细描述的任务组.从本质上讲,我想创建一个虚拟变量,该变量依赖于(member1, member2)
是否遵循给定年份的任务in order.
例如,将(member1, member2) == (1, 2)
(dput中的前九行)取为group == 1
.对于这两个成员,在任务组(group == 1
)中有3个任务(task == {1, 2, 3}
).active1
和active2
分别描述member1
和member2
在给定year
的group
中是否正在进行相应的task
.对于year == 1990
个成员中的task == 1
个,只有member1
个成员在积极地完成任务(active1 == 1, active2 == 0
),但在接下来的两年里,两个成员都在积极地完成任务(active1 == 1, active2 == 1
),以此类推.
对于每个组中的每项任务、每对成员以及每年,我希望生成两个虚拟变量:
1.)如果a)任务是组中的第一个任务(即,具有最小数目的任务)并且member1
和member2
都具有active == 1
或b),则一个等于1的伪变量).例如,对于member1, member2
的group == 1
,task == 1
的所有年份的伪变量==1,task == 2
的year == 1991
的伪变量==1,task == 3
的year == 1991
的伪变量==1.在后两个任务的其他年份,此伪变量将等于0.
2.)我要创建的第二个伪变量基本上与第一个变量相反.如果任务是a,我希望它等于1.)不是组中的第一个任务,但member1
和member2
都有active == 1
,以及b)紧挨着它之前的任务没有被主动执行,否则哑元等于零.例如,对于member1, member2
的group == 1
,这个伪变量将在所有年份的task == 1
中==0,在所有年份中的task == 2
==0,在1992年的task == 3
中==1;在后一种情况下,它将等于1,因为member1, member2
在1992年有效地执行task == 3
,但在该年不执行task == 2
.
PLEASE NOTE在我的实际数据库中任务没有被订购(1,2,3...)
.任务跳过数字(例如,(10.0, 10.07, 11.0...)
),但仍然是有序的,因此解决方案要么a)必须避免使用i, i+1
命名法,要么首先必须将我的任务变量转换为i, i+1
格式.
THANK YOU FOR ANY HELP IN ADVANCE!个
UPDATE:我得到了帮助,被接受的答案对dput非常有效-我已经为我的实际数据集编辑了他们的代码,这可能会帮助也可能不会帮助future 的答题者:
df1 <- df1 %>%
mutate(lagtask=dplyr::lag(x=task, n = 1, order_by=grouping),
lagact1=dplyr::lag(x=active1,1, order_by=grouping),
lagact2=dplyr::lag(x=active2,1, order_by=grouping)) %>%
mutate(lagact1 = ifelse(is.na(lagact1), active1, lagact1),
lagact2 = ifelse(is.na(lagact2), active2, lagact2)) %>%
mutate(dummy1=ifelse(active1 == 1 & active2 == 1 &
# lag1 == lag(x=Sequence, 1) &
(lagact1 == 1 &
lagact2 ==1),
1,0
)) %>%
mutate(dummy2=ifelse(active1 == 1 & active2 == 1 &
# lag1 == lag(x=task, 1) &
(lagact1 != 1 |
lagact2 != 1),
1,0
))