我的目标是将多个函数应用于多个列100,以启用GForce.

假设我有以下数据框

library(data.table)

df <- data.table(fruit = c('a', 'a', 'a', 'b')
                 , revenue = 1:4
                 , profit = c(2,NA,4,5)
                 ); df

   fruit revenue profit
1:     a       1      2
2:     a       2     NA
3:     a       3      4
4:     b       4      5

并且我希望将多个函数应用到多个列(除了fruit个之外)

# functions
y <- \(i) {c(min(i, na.rm = T)
             , max(i, na.rm = T)
             )
           }

# apply
df[, lapply(.SD, y)
   , fruit
   , verbose = T
   ]

Finding groups using forderv ... forder.c received 4 rows and 1 columns
0.000s elapsed (0.000s cpu) 
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu) 
lapply optimization changed j from 'lapply(.SD, y)' to 'list(y(revenue), y(profit))'
GForce is on, left j unchanged
Old mean optimization is on, left j unchanged.
Making each group and running j (GForce FALSE) ... 
  memcpy contiguous groups took 0.000s for 2 groups
  eval(j) took 0.012s for 2 calls
0.020s elapsed (0.020s cpu) 

   fruit revenue profit
1:     a       1      2
2:     a       3      4
3:     b       4      5
4:     b       4      5

现在,上面的方法奏效了! 然而,请注意,它显示的是(GForce FALSE).所以GForce是101开着的.

我认为这是因为,当使用\(i) sum(i)时,GForce为Waldi pointed out,GForce为104. 然后我try 了下面的方法,只以lapply分通过了na.rm = T

# functions
z <- \(i) {c(min
             , max
              )
           }

# apply
df[, lapply(.SD, z, na.rm = T)
   , fruit
   , verbose = T
   ]

Finding groups using forderv ... forder.c received 4 rows and 1 columns
0.000s elapsed (0.000s cpu) 
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu) 
lapply optimization changed j from 'lapply(.SD, z, na.rm = T)' to 'list(z(revenue, na.rm = T), z(profit, na.rm = T))'
GForce is on, left j unchanged
Old mean optimization is on, left j unchanged.
Making each group and running j (GForce FALSE) ... Error in z(revenue, na.rm = T) : unused argument (na.rm = T)

这一次的错误与上面的一样.具体来说,Error in z(revenue, na.rm = T) : unused argument (na.rm = T)

如有任何帮助,将不胜感激

推荐答案

help("gforce")人起:

J中仅包含函数MIN、MAX、Mean、 中位数、VAR、SD、总和、Prod、First、Last、Head、Tail(例如,DT[, 列表(均值(X),中位数(X),最小(Y),最大(Y)),by=z]),它们非常 使用我们所称的GForce进行了有效的优化.这些函数是 自动替换为相应的GForce版本 模式g*,例如,prod变为gprod.

显然,您没有传递包含这些函数的表达式.它们隐藏在y函数中(对data.table的GForce优化而言).

我会这么做:

res <- df[, 
          c(lapply(.SD, min, na.rm = TRUE), lapply(.SD, max, na.rm = TRUE)), 
          by = fruit,
          verbose = T
]
#Finding groups using forderv ... forder.c received 4 rows and 1 columns
#0.000s elapsed (0.000s cpu) 
#Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s #elapsed (0.000s cpu) 
#lapply optimization changed j from 'c(lapply(.SD, min, na.rm = TRUE), lapply(.SD, #max, na.rm = TRUE))' to 'list(min(revenue, na.rm = TRUE), min(profit, na.rm = #TRUE), max(revenue, na.rm = TRUE), max(profit, na.rm = TRUE))'
#GForce optimized j to 'list(gmin(revenue, na.rm = TRUE), gmin(profit, na.rm = #TRUE), gmax(revenue, na.rm = TRUE), gmax(profit, na.rm = TRUE))' (see ?GForce)
#Making each group and running j (GForce TRUE) ... gforce initial population of grp #took 0.000
#gforce assign high and low took 0.000
#gforce eval took 0.000
#0.000s elapsed (0.000s cpu) 

setnames(res, -1, paste(names(res)[-1], 
                        rep(c("min", "max"), each = ncol(df) - 1), 
                        sep = "."))


res <- melt(res, measure.vars = measure(eco, fun, sep = "."))
#Warning message:
#In melt.data.table(res, measure.vars = measure(eco, fun, sep = ".")) :
#  'measure.vars' [revenue.min, profit.min, revenue.max, profit.max, ...] are not all of the same type. By order of hierarchy, the molten data value column will be of type 'double'. All measure variables not of type 'double' will be coerced too. Check DETAILS in ?melt.data.table for more on coercion.

dcast(res, fruit + fun ~ eco)
#Key: <fruit, dim>
#    fruit    fun profit revenue
#   <char> <char>  <num>   <num>
#1:      a    max      4       3
#2:      a    min      2       1
#3:      b    max      5       4
#4:      b    min      5       4

该警告是因为df中的列类型不同("整型"和"双精度型").确保它们是相同的,以避免它.

R相关问答推荐

将模拟变量乘以多个观测结果中的模拟变量

如何自定义Shapviz图?

如何在RMarkdown LaTex PDF输出中包含英语和阿拉伯语?

修改用R编写的用户定义函数

然后根据不同的列值有条件地执行函数

即使硬币没有被抛出,也要保持对其的跟踪

R中的时间序列(Ts)函数计数不正确

以更少间隔的较小表中的聚合离散频率表

R:从geom_ol()中删除轮廓并导出为pdf

根据现有列的名称和字符串的存在进行变异以创建多个新列

以相同的方式对每个表进行排序

从多层嵌套列表构建Tibble?

如何将一个方阵分解成没有循环的立方体

如何在ggplot2中创建多个y轴(每个变量一个)

删除数据帧中特定行号之间的每第三行和第四行

有毒元素与表观遗传年龄的回归模型

如何使用包metaviz更改标签的小数位数?

使用其他DF中的文件名将列表中的每个元素保存到文件中

使用相对风险回归计算RR

向数据添加标签