我有一个tibble,其中一列包含嵌套列表(具体为<list<list<double>>>数据类型).
它看起来类似于以下内容(但格式为R/Arrow):
ID | nestedvals |
---|---|
001 | [[1]] (1,0.1) [[2]] (2,0.2) [[3]] (3,0.3) [[4]] (4,0.4) [[5]] (5,0.5) |
002 | [[1]] (1,0.1) [[2]] (2,0.2) [[3]] (3,0.3) [[4]] (4,0.4) |
003 | [[1]] (1,0.1) [[2]] (2,0.2) [[3]] (3,0.3) |
004 | [[1]] (1,0.1) [[2]] (2,0.2) |
005 | [[1]] (1,0.1) |
如果我调用nestedvals行中的第1行,我会得到:
tibble$nestedvals[1]
<list<list<double>>[1]>
[[1]]
<list<double>[5]>
[[1]]
[1] 1 0.1
[[2]]
[1] 2 0.2
[[3]]
[1] 3 0.3
[[4]]
[1] 4 0.4
[[5]]
[1] 5 0.5
基本上,对于nestedvals列,存在一个包含双胞胎对列表的列表的列表,第一个指示特定索引(例如5)和特定值(例如0.5).
我想做的是根据每个嵌套列表的唯一索引范围生成一组零填充列.例如: col_1、col_2、col_3、col_4、col_5
然后根据索引(每个嵌套列表中的第一个数字),针对tibble中的每一行,用值(嵌套列表中的第二个数字)替换每个0.
我相信最好的方法是取消列出变量,并用索引列表和感兴趣的值列表创建单独的列,这样我就可以在前者中找到名称生成的最大值以及分配到2之间的最大值.
为了实现这一目标,我写了一个函数来拆分每个嵌套列表:
nestsplit <- function(x, y) {
unlist(lapply(x, `[[`, y))
}
然后生成具有列名(按索引)和感兴趣的值的唯一列以附加到tible:
tibble <-
tibble |> rowwise() |> mutate(index_names = list(paste0(
"col_", as.character(nestsplit(nestedvals, 1))
)),
index_values = list(nestsplit(nestedvals, 2)))
但我想看看是否有一种有效的、基于row-wise、tidyverse/dplyr的解决方案,可以使用index_names变量中的信息将index_values中的值分配给基于索引的列名,而不是编写循环来按行分配每个值.
因此输出如下:
ID | nestedvals | col_1 | col_2 | col_3 | col_4 | col_5 |
---|---|---|---|---|---|---|
001 | <Nested list of 5 pairs of values> |
0 | 0 | 0 | 0 | 0 |
002 | <Nested list of 4 pairs of values> |
0 | 0 | 0 | 0 | 0 |
003 | <Nested list of 3 pairs of values> |
0 | 0 | 0 | 0 | 0 |
004 | <Nested list of 2 pairs of values> |
0 | 0 | 0 | 0 | 0 |
005 | <Nested list of 1 pair of values> |
0 | 0 | 0 | 0 | 0 |
相反,如下所示:
ID | nestedvals | col_1 | col_2 | col_3 | col_4 | col_5 |
---|---|---|---|---|---|---|
001 | <Nested list of 5 pairs of values> |
0.1 | 0.2 | 0.3 | 0.4 | 0.5 |
002 | <Nested list of 4 pairs of values> |
0.1 | 0.2 | 0.3 | 0.4 | 0 |
003 | <Nested list of 3 pairs of values> |
0.1 | 0.2 | 0.3 | 0 | 0 |
004 | <Nested list of 2 pairs of values> |
0.1 | 0.2 | 0 | 0 | 0 |
005 | <Nested list of 1 pair of values> |
0.1 | 0 | 0 | 0 | 0 |
对于上述generate部分example数据,请用途:
tibble <-
structure(
list(
ID = c(001, 002, 003, 004, 005),
nestedvals = structure(
list(
structure(
list(c(1, 0.1), c(2, 0.2), c(3, 0.3), c(4, 0.4), c(5, 0.5)),
class = c("arrow_list", "vctrs_list_of", "vctrs_vctr", "list"),
ptype = numeric(0)
),
structure(
list(c(1, 0.1), c(2, 0.2), c(3, 0.3), c(4, 0.4)),
class = c("arrow_list", "vctrs_list_of", "vctrs_vctr", "list"),
ptype = numeric(0)
),
structure(
list(c(1, 0.1), c(2, 0.2), c(3, 0.3)),
class = c("arrow_list", "vctrs_list_of", "vctrs_vctr", "list"),
ptype = numeric(0)
),
structure(
list(c(1, 0.1), c(2, 0.2)),
class = c("arrow_list", "vctrs_list_of", "vctrs_vctr", "list"),
ptype = numeric(0)
),
structure(
list(c(1, 0.1)),
class = c("arrow_list", "vctrs_list_of", "vctrs_vctr", "list"),
ptype = numeric(0)
)
),
ptype = structure(
list(),
class = c("arrow_list", "vctrs_list_of", "vctrs_vctr", "list"),
ptype = numeric(0)
),
class = c("arrow_list", "vctrs_list_of", "vctrs_vctr", "list")
)
),
row.names = c(NA, -5L),
class = c("tbl_df", "tbl", "data.frame")
)