我有以下问题:对于一个Rating
,6个数字列中的3个值丢失.
我希望将缺失的值替换为位于具有缺失的值的列左侧的单元格的值.The column names have a clear pattern - they all end with 100, 101 or 102, which can be used as id.
Example:对于Rating
1C、基本Scenario
和公司Class
,列eePD1
中的值缺失.
只有当Class
是公司时,才应该用列wePD1
中的值来替换它.(零售Class
不应被碰).
I.e, the value in eePD1=wePD1 if the value in eePD1 was NA and Class=CORP.个
Scenario | Rating | Class | wePD1 | wePD2 | wePD3 | eePD1 | eePD2 | eePD3 |
---|---|---|---|---|---|---|---|---|
Basic | 1C | CORP | 0.51 | 0.74 | 0.43 | NA | NA | NA |
Basic | 2A | CORP | 0.41 | 0.01 | 0.23 | 0.37 | 0.06 | 0.81 |
Basic | 2B | RETAIL | 0.68 | 0.48 | 0.71 | NA | NA | NA |
Desirable result:如果从ee
开始的列中单元格的值为NA
for the CORP 102 and any 103 and 104,则将NA
替换为从we
开始且以类似于从ee
开始的列名结尾的列的值,例如PD1
、PD2
或PD3
.
Scenario | Rating | Class | wePD1 | wePD2 | wePD3 | eePD1 | eePD2 | eePD3 |
---|---|---|---|---|---|---|---|---|
Basic | 1C | CORP | 0.51 | 0.74 | 0.43 | 0.51 | 0.74 | 0.43 |
Basic | 2A | CORP | 0.41 | 0.01 | 0.23 | 0.37 | 0.06 | 0.81 |
Basic | 2B | RETAIL | 0.68 | 0.48 | 0.71 | NA | NA | NA |
以下是可重现的代码块:
df3=structure(list(Scenario = c("Basic", "Basic", "Basic", "Basic", "Basic"
), Rating = c("1C", "2A", "2B", "2C", "3A"), Class = c("CORP",
"CORP", "CORP", "CORP", "RETAIL"), wePD1 = c(0.51,
0.41, 0.58, 0.28,
0.68), wePD2 = c(0.74, 0.01,
0.28, 0.92, 0.48
), wePD3 = c(0.43, 0.23, 0.04,
0.62, 0.71), eePD1 = c(NA, 0.37,
0.96, 0.22, NA
), eePD2 = c(NA, 0.06, 0.29, 0.22,
NA), eePD3 = c(NA, 0.81, 0.85,
0.78, NA)), row.names = c(NA,
-5L), class = c("data.table", "data.frame"))
第二个解决方案选项是向左数三列,但它是not elegant or generalizable,以防我有更多的列...
到目前为止,我只能找到几个dplyr
个示例,可以将值替换为下方/上方的值,但不能替换为侧面的值,或者基于清晰的列名模式以获得更好的通用性,如purrr
和map2_int
.