我有以下 struct :

table user
user_id | month_year | fruits  
------------------------------
1       | 2021-01    | apple
1       | 2021-01    | melon
1       | 2021-01    | orange
1       | 2021-02    | grape
1       | 2021-02    | orange
1       | 2021-02    | kiwi
1       | 2021-03    | grape
1       | 2021-03    | pear
1       | 2021-03    | banana
1       | 2021-04    | orange
1       | 2021-04    | kiwi
1       | 2021-04    | banana
1       | 2021-05    | grape
1       | 2021-05    | pear
1       | 2021-05    | kiwi

我想要的结果如下:

user     | month_year |            fruits            |  two_months_most_freq
-------------------------------------------------------------------------
1        | 2021-01    | apple, melon, orange         | orange
1        | 2021-02    | grape, orange, kiwi          | orange
1        | 2021-03    | grape, pear, banana          | grape
1        | 2021-04    | orange, kiwi, banana         | banana
1        | 2021-05    | grape, pear, kiwi            | kiwi

结清:在最后一栏中,我想要最近两个月内重复最多的水果,换句话说,在实际和之前的行中重复最多的.请注意,第一行中应该返回orange,因为当后面的窗口框不可用时,应该使用前面的窗口框.

在下面的代码中,我获得了整个数据集中最频繁的结果.

select * from (
  select user_id, year_month, 
    string_agg(distinct fruit) as fruits
  from user
  group by  user_id, year_month
) join (
  select user_id, fruit
  from user
  group by user_id, fruit
  qualify 1 = row_number() over(partition by user_id order by count(*) desc)
)
using (user_id)   

如何将此逻辑应用于特定的时间窗口?

推荐答案

请考虑以下内容

select user_id, month_year, fruits, 
  if(prev_month_exists, two_months_most_freq, first_value(two_months_most_freq) over next_month) as two_months_most_freq 
from (
  select user_id, month_year, fruits, 
    ( select fruit from unnest(split(two_month_fruits)) fruit
      group by fruit order by count(*) desc limit 1
    ) as two_months_most_freq, 
    month, prev_month_exists
  from (
    select distinct user_id, month_year, month, 
      string_agg(fruit) over(partition by user_id, month_year) fruits,
      string_agg(fruit) over last_two_months as two_month_fruits,
      0 < count(*) over prev_month as prev_month_exists
    from users, unnest([struct(
      12 * extract(year from date(month_year || '-01')) + extract(month from date(month_year || '-01')) as month
    )]) 
    window 
      last_two_months as (partition by user_id order by month range between 1 preceding and current row), 
      prev_month as (partition by user_id order by month range between 1 preceding and 1 preceding)
  )
)
window next_month as (partition by user_id order by month range between 1 following and 1 following)                 

如果应用于问题输出中的样本数据,则为

enter image description here

Sql相关问答推荐

Postgres JSONB对象筛选

如何在Snowflake SQL存储过程中传递LIMIT和OFFSET的参数?

为表中每个缺少的引用创建新行

部分匹配表中元素的MariaDB查询查找结果

如何在postgres函数中插入后返回布尔值?

PostgreSQL基于2个COLS的任意组合 Select 唯一行

将结果从一列转换为两行或更多

如何为缺少的类别添加行

Postgresql - 如何根据阈值计算累积和

如何使用最后一个非 NULL 值在 PostgreSQL 列中填充 NULL 值

如何从postgresql中的项目映射(关联数组)设置值?

在presto sql中解析带有区域的时间格式

如何在 case 语句中使用聚合?

在Snowflake中,如何将以逗号和连字符分隔的多个混合数值拆分成数字列表

比使用NOT EXISTS更高效的SQL删除方法是什么?

超过100名员工的连续行

SQL/Postgres:按日期和其他属性对相关性能进行分组

使用一组值进行分组和计数

SQL 中的问题与包含最大日期的记录连接

在 MindsDB SQL 编辑器中运行 PostgreSQL 条目 ID 时出现未知 Select 目标错误