这里是我的数据集的一部分

mydata=structure(list(sales_point_id = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), calendar_id_operday = c(20210102L, 
20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 
20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 
20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 
20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 
20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 
20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 
20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 20210102L, 
20210102L, 20210102L, 20210102L, 20210102L), line_fact_amt = c(23749L, 
1000L, 3050L, 1550L, 8900L, 1550L, 0L, 300L, 0L, 499L, 5450L, 
300L, 0L, 499L, 599L, 599L, 6050L, 300L, 599L, 1400L, 300L, 0L, 
2000L, 700L, 0L, 5990L, 8877L, 1999L, 257L, 200L, 361L, 300L, 
1990L, 2453L, 3140L, 0L, 0L, 199L, 599L, 10990L, 7990L, 773L, 
400L, 6000L, 2269L, 2000L, 1999L)), class = "data.frame", row.names = c(NA, 
-47L))

-calendar_id_operday表示一周中的第20210line_fact_amt(YYYYMMDD)天.我需要总计line_fact_amt美元,每天calendar_id_operday美元.

mydata2=structure(list(sales_point_id = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), calendar_id_operday = c(20220102L, 
20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 
20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 
20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 
20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 
20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 
20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 
20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 20220102L, 
20220102L, 20220102L, 20220102L, 20220102L), line_fact_amt = c(38586L, 
15837L, 17887L, 16387L, 23737L, 16387L, 14837L, 15137L, 14837L, 
15336L, 20287L, 15137L, 14837L, 15336L, 15436L, 15436L, 20887L, 
15137L, 15436L, 16237L, 15137L, 14837L, 16837L, 15537L, 14837L, 
20827L, 23714L, 16836L, 15094L, 15037L, 15198L, 15137L, 16827L, 
17290L, 17977L, 14837L, 14837L, 15036L, 15436L, 25827L, 22827L, 
15610L, 15237L, 20837L, 17106L, 16837L, 16836L)), class = "data.frame", row.names = c(NA, 
-47L))

也就是line_fact_amtcalendar_id_operday.

The Greatest Difficulty for me: if the aggregated value of the same day of the next year(2022) is greater than for the same day but of the last year(2021), well, for example, for 20220102 the sum = 815519, while for the same day but 2021 = 118180, we have a difference for this day between 2021 and 2022 more than 40%, and if there is a difference for the same day>40%, then replace for the next year with the values of the previous year of this date + 10%, i.e. 815519 is replaced by 118180 + 10% of it (129998). How to do this procedure for sales_point_id separately?

谢谢你宝贵的帮助.

推荐答案

根据描述,这可能会有所帮助

library(lubridate)
library(dplyr)
 agg1 <- mydata %>% 
 group_by(calendar_id_operday) %>% 
 summarise(line_fact_amt = sum(line_fact_amt, na.rm = TRUE))
 agg2 <- mydata2 %>%
  group_by(calendar_id_operday) %>%
  summarise(line_fact_amt = sum(line_fact_amt, na.rm = TRUE))
 agg2 %>%
  mutate(calendar_prev = as.integer(format(ymd(calendar_id_operday) - 
   years(1), '%Y%m%d')) ) %>% 
  full_join(agg1, ., by = c(calendar_id_operday = "calendar_prev")) %>% 
    mutate(changed = ifelse((line_fact_amt.y  - line_fact_amt.x) >= 0.4 * 
     line_fact_amt.y, line_fact_amt.x + 0.1 * line_fact_amt.x, line_fact_amt.y ))

R相关问答推荐

基于两个现有列创建新列

过滤Expand.Grid的结果

如何使用Cicerone指南了解R Shiny中传单 map 的元素?

在特定列上滞后n行,同时扩展框架的长度

在ggplot Likert条中添加水平线

根据R中的另一个日期从多列中 Select 最近的日期和相应的结果

如何使用`ggplot2::geom_segment()`或`ggspatial::geom_spatial_segment()`来处理不在格林威治中心的sf对象?

如何使用R对每组变量进行随机化?

用预测NLS处理R中生物学假设之上的误差传播

如何根据R中其他列的值有条件地从列中提取数据?

如何对2个列表元素的所有组合进行操作?

派生程序包| ;无法检索';return()';的正文

使用不同的定性属性定制主成分分析中点的 colored颜色 和形状

仅当后续值与特定值匹配时,才在列中回填Nas

将统计检验添加到GGPUBR中的盒图,在R

计算使一组输入值最小化的a、b和c的值

如何获取R chromote中的当前URL?

使用R、拼图和可能的网格包绘制两个地块的公共垂直线

如何用不同长度的向量填充列表?

如果极点中存在部分匹配,则替换整个字符串