说吧,我有以下几点
# dummy data
df <- data.table(metric_1 = c(1,1,3)
, metric_2 = c(1,2,2)
); df
metric_1 metric_2
1: 1 1
2: 1 2
3: 3 2
在对每个计算的(分组依据)列进行行计数之前,我想通过对每一列执行计算(为了说明,简化如下)来遍历这两列(实际数据帧还有许多其他列):
# metric columns
x <- c('metric_1', 'metric_2')
# list to capture results
y <- vector('list', length(x))
# summarise
for (i in seq_along(x))
{
y[[i]] <- df[, .(rows = .N)
, by = .(fifelse(get(x[[i]]) == 1, 0, get(x[[i]])))
]
}
上面的工作原理是给出一个汇总表格的列表:
> y
[[1]]
fifelse rows
1: 0 2
2: 3 1
[[2]]
fifelse rows
1: 0 1
2: 2 2
但是,是否可以在循环内将GROUP BY列命名?我试了一下,用了x[[i]]
:
for (i in seq_along(x))
{
y[[i]] <- df[, .(rows = .N)
, by = .(x[[i]] = fifelse(get(x[[i]]) == 1, 0, get(x[[i]])))
]
}
但得到的错误是:
Error: unexpected '=' in:
" df[, .(rows = .N)
, by = .(x[[i]] ="
考虑到数据量,data.table
%的解决方案将不胜感激.