我正在try 创建变量,如果一个年龄组与一个间隔重叠,则设置为1,如果不是0(因为丢失值已成为我在发布时意识到的一个问题).我还没有找到堆栈上的相关示例,或者我能够重现的示例(参见下面的IVs/IRanges和遗漏的值).
以下是我的dput:
structure(list(`Est. Lower Age Range` = c(18, 18, 50, 50, 50,
65, 18, 18, 18, 18, 65, 65, 65, 65, 65, 0.5, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 65, 65, 16, 16, 16, 16, 65, 65), `Est. Upper Age Range` = c(49,
49, 64, 64, 64, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 65, NA,
NA, NA, NA, NA, NA, NA, NA, 65, 65, NA, NA, NA, NA, 65, 65, NA,
NA)), class = c("data.table", "data.frame"), row.names = c(NA,
-34L), .internal.selfref = <pointer: 0x000002118266d280>)
我try 了几个包,包括IVS、GRIGN和IRanges,其中一个包不允许变量中包含值.我try 的基本编码如下:
Flag_Prep$`0-2` <- ifelse((Flag_Prep$`Est. Lower Age Range` > 0 & Flag_Prep$`Est. Lower Age Range` <= 2) |
(Flag_Prep$`Est. Upper Age Range` > 0 & Flag_Prep$`Est. Upper Age Range` <= 2),
1, 0)
Flag_Prep$`0-5` <- ifelse((Flag_Prep$`Est. Lower Age Range` > 0 & Flag_Prep$`Est. Lower Age Range` <= 5) |
(Flag_Prep$`Est. Upper Age Range` > 0 & Flag_Prep$`Est. Upper Age Range` <= 5),
1, 0)
Flag_Prep$`5-17` <- ifelse((Flag_Prep$`Est. Lower Age Range` >= 5 & Flag_Prep$`Est. Lower Age Range` < 18) |
(Flag_Prep$`Est. Upper Age Range` >= 5 & Flag_Prep$`Est. Upper Age Range` < 18),
1, 0)
Flag_Prep$`18-64` <- ifelse((Flag_Prep$`Est. Lower Age Range` >= 18 & Flag_Prep$`Est. Lower Age Range` < 65) |
(Flag_Prep$`Est. Upper Age Range` >= 18 & Flag_Prep$`Est. Upper Age Range` < 65),
1, 0)
Flag_Prep$`65+` <- ifelse(Flag_Prep$`Est. Lower Age Range` >= 65 | Flag_Prep$`Est. Upper Age Range` >= 65,
1, 0)
这将导致:
structure(list(`Est. Lower Age Range` = c(18, 18, 50, 50, 50,
65, 18, 18, 18, 18, 65, 65, 65, 65, 65, 0.5, 16, 16, 16, 16,
16, 16, 16, 16, 16, 16, 65, 65, 16, 16, 16, 16, 65, 65), `Est. Upper Age Range` = c(49,
49, 64, 64, 64, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 65, NA,
NA, NA, NA, NA, NA, NA, NA, 65, 65, NA, NA, NA, NA, 65, 65, NA,
NA), `0-2` = c(0, 0, 0, 0, 0, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, 0, 0, NA, NA, NA,
NA, 0, 0, NA, NA), `0-5` = c(0, 0, 0, 0, 0, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, 0, 0,
NA, NA, NA, NA, 0, 0, NA, NA), `5-17` = c(0, 0, 0, 0, 0, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, NA, NA, 1, 1, 1, 1, NA, NA), `18-64` = c(1, 1, 1, 1, 1,
NA, 1, 1, 1, 1, NA, NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA,
NA, NA, 0, 0, NA, NA, NA, NA, 0, 0, NA, NA), `65+` = c(0, 0,
0, 0, 0, 1, NA, NA, NA, NA, 1, 1, 1, 1, 1, 1, NA, NA, NA, NA,
NA, NA, NA, NA, 1, 1, 1, 1, NA, NA, 1, 1, 1, 1)), class = c("data.table",
"data.frame"), row.names = c(NA, -34L), .internal.selfref = <pointer: 0x000002118266d280>)
如果一个年龄组跨越一个区间(例如5-17),我希望有一个1.目前,这适用于为几个组标记的0.5-17,但当有0.5-65时,只有2岁以下、5岁以下和65岁以上的人的变量为1,但中间的两个区间显示0.
从编码的Angular 来看,这是有意义的,但我对如何纠正这一点感到困惑.正如我上面提到的,我现在也意识到,我需要确定如何处理其中一个年龄段缺失的情况(他们应该缺失).
EDIT个
我还需要指出的是,无论是否使用第一个NA行,以下代码都会导致错误
Flag_Prep$`5-17` <- NA
Flag_Prep$`5-17` <- ifelse((Flag_Prep$`Est. Lower Age Range` >= 5 & Flag_Prep$`Est. Lower Age Range` < 18) |
(Flag_Prep$`Est. Upper Age Range` >= 5 & Flag_Prep$`Est. Upper Age Range` < 18) |
(Flag_Prep$`Est. lower Age Range` < 5 & Flag_Prep$`Est. Upper Age Range` > 17),
1, 0)
# Error in `$<-.data.frame`(`*tmp*`, `5-17`, value = logical(0)) :
# replacement has 0 rows, data has 9584