有许多类似的问题和答案已经张贴,但我找不到一个与这些差异.1) 零的计数重新开始,2)有一个数学函数应用于被替换的值.
根据客户的日期,事件是否发生(NULL或1).可以假设客户每个日期只有一行.
我想用基于连续零值数量(事件开始时间)的衰减函数替换零值.客户可以每天都参加活动,可以跳过一天,也可以跳过几天.但一旦事件发生,衰退就会重新开始.目前,我的衰减除以2,但这是例如.
DT | CUSTOMER | EVENT | DESIRED |
---|---|---|---|
2022-01-01 | a | 1 | 1 |
2022-01-02 | a | 1 | 1 |
2022-01-03 | a | 1 | 1 |
2022-01-04 | a | 1 | 1 |
2022-01-05 | a | 1 | 1 |
2022-01-01 | b | 1 | 1 |
2022-01-02 | b | 0.5 | |
2022-01-03 | b | 0.25 | |
2022-01-04 | b | 1 | 1 |
2022-01-05 | b | 0.5 |
我可以产生预期的结果,但它非常笨拙.看看是否有更好的方法.这需要扩展到多个事件列.
create or replace temporary table the_data (
dt date,
customer char(10),
event int,
desired float)
;
insert into the_data values ('2022-01-01', 'a', 1, 1);
insert into the_data values ('2022-01-02', 'a', 1, 1);
insert into the_data values ('2022-01-03', 'a', 1, 1);
insert into the_data values ('2022-01-04', 'a', 1, 1);
insert into the_data values ('2022-01-05', 'a', 1, 1);
insert into the_data values ('2022-01-01', 'b', 1, 1);
insert into the_data values ('2022-01-02', 'b', NULL, 0.5);
insert into the_data values ('2022-01-03', 'b', NULL, 0.25);
insert into the_data values ('2022-01-04', 'b', 1, 1);
insert into the_data values ('2022-01-05', 'b', NULL, 0.5);
with
base as (
select * from the_data
),
find_nan as (
select *, case when event is null then 1 else 0 end as event_is_nan from base
),
find_nan_diff as (
select *, event_is_nan - coalesce(lag(event_is_nan) over (partition by customer order by dt), 0) as event_is_nan_diff from find_nan
),
find_nan_group as (
select *, sum(case when event_is_nan_diff = -1 then 1 else 0 end) over (partition by customer order by dt) as nan_group from find_nan_diff
),
consec_nans as (
select *, sum(event_is_nan) over (partition by customer, nan_group order by dt) as n_consec_nans from find_nan_group
),
decay as (
select *, case when n_consec_nans > 0 then 0.5 / n_consec_nans else 1 end as decay_factor from consec_nans
),
ffill as (
select *, first_value(event) over (partition by customer order by dt) as ffill_value from decay
),
final as (
select *, ffill_value * decay_factor as the_answer from ffill
)
select * from final
order by customer, dt
;
谢谢