基于Dplyr、Cumsum和LAG的陈年库存计划

发布于12月19日

这是我多年被动浏览后的第一个stackoverflow帖子.我被这个问题卡住了，快把我逼疯了！多谢你帮忙.

我有一个按年龄分组的库存供应和需求数据框架.我已经为许多产品收集了这些数据.对某一年限的库存的需求，可以通过同龄或更低年限的供应来满足.我正在努力计算，在每个年龄段内，从最年轻到最年长的人，能满足多少需求.

数据框将很大(10^7行)，所以我try 使用dplyr、mutate、lag和cumsum来实现这一点，而不是循环，我怀疑循环会很慢.

以下是我的数据集中的一个样本组(省略了产品和日期分组):

library(dplyr)

Inventory <- data.frame(
  Age = c(90, 120, 270, 365, Inf),
  Demand = c(0, 5000, 25, 5000, 10),
  Supply = c(4000, 50, 4000, 300, 0))

View(Inventory)

我期待的结果是:

Result <- Inventory
Result$Start = c(0, 4000, 0, 3975, 0)
Result$In = c(4000, 50, 4000, 300, 0)
Result$Out = c(0, 4050, 25, 4275, 0)
Result$End = c(4000, 0, 3975, 0, 0)
Result$Short = c(0, 950, 0, 725, 10)

View(Result)

我应用了上面的标准库存计算:

开始=结束
In=供应
输出=最小(需求，开始+输入)
结束=开始+输入-输出
短路=需求-输出

我使用dplyr不太顺利，但我认为有一个解决方案，可以巧妙地结合使用max、min、Lag和Cumsum.

library(dplyr) cumnominus <- Rcpp::cppFunction("NumericVector cumnominus(NumericVector x) { int n = x.size(); NumericVector sumout(n); sumout[0] = (x[0] < 0) ? 0 : x[0]; for(int i = 1; i < n; i++) { sumout[i] = (x[i] < 0) ? 0 : x[i] + sumout[i - 1]; } return sumout; }") Inventory |> mutate(In = Supply, End = cumnominus(Supply - Demand), Start = lag(End, default = 0), Short = pmax(0, Demand - (Start + Supply)), Out = pmin(Demand, Start + In)) |> select(Age, Demand, Supply, Start, In, Out, End, Short) #> Age Demand Supply Start In Out End Short #> 1 90 0 4000 0 4000 0 4000 0 #> 2 120 5000 50 4000 50 4050 0 950 #> 3 270 25 4000 0 4000 25 3975 0 #> 4 365 5000 300 3975 300 4275 0 725 #> 5 Inf 10 0 0 0 0 0 10 Result #> Age Demand Supply Start In Out End Short #> 1 90 0 4000 0 4000 0 4000 0 #> 2 120 5000 50 4000 50 4050 0 950 #> 3 270 25 4000 0 4000 25 3975 0 #> 4 365 5000 300 3975 300 4275 0 725 #> 5 Inf 10 0 0 0 0 0 10

cumnominus_r <- function(x) { out_sum <- integer(length(x)) out_sum[1] <- max(0, x[1]) for (i in 2:length(x)) { out_sum[i] <- ifelse(x[i] < 0, 0, out_sum[i - 1] + x[i]) } out_sum } big_df <- tibble( Demand = sample(seq(1000, 6000, 500), 5000000, replace = TRUE), Supply = sample(seq(1000, 6000, 500), 5000000, replace = TRUE) ) bench::mark( Rcpp_fun = big_df |> mutate(End = cumnominus(Supply - Demand)), R_only_fun = big_df |> mutate(End = cumnominus_r(Supply - Demand)) ) #> # A tibble: 3 × 6 #> expression min median `itr/sec` mem_alloc `gc/sec` #> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> #> 1 Rcpp_fun 43.15ms 52.95ms 16.1 77.7MB 8.94 #> 2 R_only_fun 8.59s 8.59s 0.116 95.4MB 20.8

基于Dplyr、Cumsum和LAG的陈年库存计划

推荐答案

R相关问答推荐

geom_Ribbon条件填充创建与数据不匹配的形状(ggplot 2 r)

如果索引重复，聚合xts核心数据

具有多个依赖变量/LHS的逻辑模型

如何根据组大小应用条件过滤？

如何直接从Fortran到R的数组大小？

移除仪表板Quarto中顶盖和车身之间的白色区域

在数组索引上复制矩阵时出错

如何在R forestplot中为多条垂直线分配唯一的 colored颜色？

将二进制数据库转换为频率表

如何将Which()函数用于管道%>；%

根据另一列中的值和条件查找新列的值

如何计算每12行的平均数？

基于R中的辅助向量中的值有条件地连接向量中的字符串

如何在R中创建条形图，使条形图在y轴上围绕0.5而不是0构建条形图？

Data.table：：Shift type=允许扩展数据(&Q；LAG&Q；)

在GT()中的列之间添加空格

使用其他DF中的文件名将列表中的每个元素保存到文件中

有没有办法更改ggplot2中第二个y轴的比例限制？

把代码写成dplyr中的group_by/摘要更简洁吗？

使用点图调整离散轴比例