我有以下 struct :
table user
user_id | month_year | fruits
------------------------------
1 | 2021-01 | apple
1 | 2021-01 | melon
1 | 2021-01 | orange
1 | 2021-02 | grape
1 | 2021-02 | orange
1 | 2021-02 | kiwi
1 | 2021-03 | grape
1 | 2021-03 | pear
1 | 2021-03 | banana
1 | 2021-04 | orange
1 | 2021-04 | kiwi
1 | 2021-04 | banana
1 | 2021-05 | grape
1 | 2021-05 | pear
1 | 2021-05 | kiwi
我想要的结果如下:
user | month_year | fruits | two_months_most_freq
-------------------------------------------------------------------------
1 | 2021-01 | apple, melon, orange | orange
1 | 2021-02 | grape, orange, kiwi | orange
1 | 2021-03 | grape, pear, banana | grape
1 | 2021-04 | orange, kiwi, banana | banana
1 | 2021-05 | grape, pear, kiwi | kiwi
结清:在最后一栏中,我想要最近两个月内重复最多的水果,换句话说,在实际和之前的行中重复最多的.请注意,第一行中应该返回orange
,因为当后面的窗口框不可用时,应该使用前面的窗口框.
在下面的代码中,我获得了整个数据集中最频繁的结果.
select * from (
select user_id, year_month,
string_agg(distinct fruit) as fruits
from user
group by user_id, year_month
) join (
select user_id, fruit
from user
group by user_id, fruit
qualify 1 = row_number() over(partition by user_id order by count(*) desc)
)
using (user_id)
如何将此逻辑应用于特定的时间窗口?