我正在处理R中的一个数据集,叫做"data",它来自于Fronius逆变器上的数据收集.此数据集每分钟包含一条记录和一个名为"pac_w"的列,该列代表发电的瓦数.逆变器有一个保护系统,可在发生过压时中断发电.当这种情况发生时,"pac_w"列连续四分钟被记录为零(记住每行代表一分钟),并且需要额外的两分钟来稳定能量产生.近几个月来,这些中断频繁发生,严重影响了能源生产.
下面是真实数据的例子.
编辑
现在有更多的行.
pac_w <- c(3336,3294,0,0,0,0,742,1620,2530,3438,2626,3704,2321,3088,1672,2722,
1953,0,0,0,0,836,1746,2654,3566,0,0,0,0,995,1908,2800)
day_energy_wh <- c(2479,2536,2555,2555,2555,2555,2560,2580,2615,2665,2717,2766,
2811,2868,2903,2944,2966,2979,2979,2979,2979,2986,3008,3045,
3097,3097,3097,3097,3097,3106,3131,3171)
date_time <- c("2023-12-23,08:13:00","2023-12-23,08:14:00","2023-12-23,08:15:00",
"2023-12-23,08:16:00","2023-12-23,08:17:00","2023-12-23,08:18:00",
"2023-12-23,08:19:00","2023-12-23,08:20:00","2023-12-23,08:21:00",
"2023-12-23,08:22:00","2023-12-23,08:23:00","2023-12-23,08:24:00",
"2023-12-23,08:25:00","2023-12-23,08:26:00","2023-12-23,08:27:00",
"2023-12-23,08:28:00","2023-12-23,08:29:00","2023-12-23,08:30:00",
"2023-12-23,08:31:00","2023-12-23,08:32:00","2023-12-23,08:33:00",
"2023-12-23,08:34:00","2023-12-23,08:35:00","2023-12-23,08:36:00",
"2023-12-23,08:37:00","2023-12-23,08:38:00","2023-12-23,08:39:00",
"2023-12-23,08:40:00","2023-12-23,08:41:00","2023-12-23,08:42:00",
"2023-12-23,08:43:00","2023-12-23,08:44:00")
data <- data.frame(pac_w,day_energy_wh,date_time)
我的目标是估计逆变器由于这种过压保护而无法产生多少瓦时.
DAY_ENERGY_WH列显示截至DATE_TIME列中时间的当天累计能量.
我想通过计算故障发生前的值(在 case 3294中)和稳定后的值(在 case 2530中)的平均值来估计未产生的能量
(3294 + 2530) / 2 = 2912
在示例数据中,逆变器停止发电多少瓦时的估计值为252.
round(sum(2912 - pac_w[3:8])/60) = 252
在一天的开始和结束时,通常有低值甚至等于零的值.所以我只想估计当紧接在等于零的四个值之前的pac_w的值等于或大于500时没有产生的能量.
编辑
r2evans,你的第一个解决方案给出了正确的值,但并不免疫连续零的数量的变化.
第二个解决方案不受连续零的变化的影响,但这意味着只有与连续零的出现有关的第一个计算才具有正确值.
r <- rle(data$pac_w == 0)
four0 <- setdiff(which(r$values), c(1L, length(r$values)))
four0 <- four0[r$lengths[four0 + 1] >= 3]
lapply(four0, function(f0) {
indprev <- sum(r$lengths[1:(f0-1)])
indtween <- (f0-1):sum(r$lengths[1:f0])+2
indnext <- max(indtween)+1
val <- sum(
mean(data$pac_w[ c(indprev, indnext) ]) - data$pac_w[indtween]
) / 60
cbind(data[indprev+1,], data.frame(lost = val))
}) |>
do.call(rbind, args = _)
# pac_w day_energy_wh date_time lost
# 3 0 2555 2023-12-23,08:15:00 251.8333 # correct
# 18 0 2979 2023-12-23,08:30:00 246.1417 # incorrect
# 26 0 3097 2023-12-23,08:38:00 690.9000 # incorrect
data |>
mutate(
starts = cumsum(zoo::rollapply(pac_w == 0, 4, align="left", partial=TRUE, FUN=all)),
prev_pac_w = lag(pac_w)
) |>
summarize(
.by = starts,
date_time = first(date_time),
lost = if (first(pac_w) == 0) {
sum(mean(c(first(prev_pac_w), pac_w[which(pac_w > 0)[1]+2])) -
pac_w[1:(which(pac_w > 0)[1]+1)]) / 60
} else NA
)
# starts date_time lost
# 1 0 2023-12-23,08:13:00 NA
# 2 1 2023-12-23,08:15:00 251.8333 # correct
# 3 2 2023-12-23,08:30:00 187.3167 # correct
# 4 3 2023-12-23,08:38:00 269.9167 # correct