如何利用单列历史SQLsnowflake获得合并结果

发布于01月23日

我有一个用户的表历史任务，看起来是这样的.1个用户有9个任务.

user_id     task    IS_COMPLETED updated_at
123       Task 1    1            2024-01-01
123       Task 2    1            2024-01-01
123       Task 3    0            2024-01-01
123       Task 4    0            2024-01-01
123       Task 5    0            2024-01-01
123       Task 6    1            2024-01-01
123       Task 7    1            2024-01-01
123       Task 8    1            2024-01-01
123       Task 9    1            2024-01-01

我想把可能的任务组合在一起，变成这样的表.所以我知道什么样的任务组合是用户快速完成的.

combination     total_user_completed
task1_task2         20
task1_task3         15 
task1_task_4        15
task1_task_5        14
:                    : (and so on for 2 combination) 
task1_task2_task3   10
task4_task5_task_6  11
:                   : (and so on for 3 combination)

(有关组合示例的更多详细信息:从A、B、C我得到了A-B、B-C、A-C和A-B-C之间的组合.不需要按顺序)

我曾try 使用递归SQL，但效果不佳

"递归联接内存不足.请重新运行此查询更大的仓库"

然后我也试着用Case When来定义，但我需要像这样一个一个地定义，这并不理想.

    with all_task as (
            select 
                distinct user_id,
                task_name as task, 
                is_completed,
                updated_at
            from task
            where 
                is_completed = 1
            group by all
          ),
        agg as (
            select 
                distinct user_id, 
                activated_at as period,
                max(case when task in ('Task 1') and IS_COMPLETED = 1 then date(updated_at) end) as task_1,
                max(case when task in ('Task 2') and IS_COMPLETED = 1 then date(updated_at) end) as task_2,
                max(case when task in ('Task 3') and IS_COMPLETED = 1 then date(updated_at) end) as task_3,
                max(case when task in ('Task 4') and IS_COMPLETED = 1 then date(updated_at) end) as task_4,
                max(case when task in ('Task 5') and IS_COMPLETED = 1 then date(updated_at) end) as task_5,
                max(case when task in ('Task 6') and IS_COMPLETED = 1 then date(updated_at) end) as task_6,
                max(case when task in ('Task 7') and IS_COMPLETED = 1 then date(updated_at) end) as task_7,
                max(case when task in ('Task 8') and IS_COMPLETED = 1 then date(updated_at) end) as task_8,
                max(case when task in ('Task 9') and IS_COMPLETED = 1 then date(updated_at) end) as task_9
            from all_task 
            group by all
            ),
            task_group as (
            select user_id, 
                case when task_1 is not null and task_2 is not null then user_id end as task_1_2
                case when task_2 not null and task_3 is not null then user_id end as task_2_3
                .........
            from agg
             )
            select 'Task1_task2' as combination, 
            count(distinct task_1_2) as total
            from task_group

            union all 

            select 'task2_task3' as combination, 
            count(distinct task_2_3) as total
            from task_group
.... (and so on)

有没有解决这一问题的建议？非常感谢!

*注意:目前我需要至少2/3的任务组合.谢谢

with data(user_id, task, is_completed, updated_at) as ( select * from values (123, 'Task 1', 1, '2024-01-01'::date), (123, 'Task 2', 1, '2024-01-01'::date), (123, 'Task 3', 0, '2024-01-01'::date), (123, 'Task 4', 0, '2024-01-01'::date), (123, 'Task 5', 0, '2024-01-01'::date), (123, 'Task 6', 1, '2024-01-01'::date), (123, 'Task 7', 1, '2024-01-01'::date), (123, 'Task 8', 1, '2024-01-01'::date), (123, 'Task 9', 1, '2024-01-01'::date) )

), combinations_2 as ( select a.user_id, a.task || '_' || b.task as combi from data as a join data as b on a.user_id = b.user_id --and a.updated_at = b.updated_at and a.is_completed = 1 and b.is_completed = 1 and a.task < b.task ),

), combinations_3 as ( select a.user_id, a.task || '_' || b.task || '_' || c.task as combi from data as a join data as b on a.user_id = b.user_id --and a.updated_at = b.updated_at and a.is_completed = 1 and b.is_completed = 1 and a.task < b.task join data as c on a.user_id = c.user_id --and a.updated_at = c.updated_at and a.is_completed = 1 and c.is_completed = 1 and b.task < c.task ),

with data(user_id, task, is_completed, updated_at) as ( select * from values (123, 'Task 1', 1, '2024-01-01'::date), (123, 'Task 2', 1, '2024-01-01'::date), (123, 'Task 3', 0, '2024-01-01'::date), (123, 'Task 4', 0, '2024-01-01'::date), (123, 'Task 5', 0, '2024-01-01'::date), (123, 'Task 6', 1, '2024-01-01'::date), (123, 'Task 7', 1, '2024-01-01'::date), (123, 'Task 8', 1, '2024-01-01'::date), (123, 'Task 9', 1, '2024-01-01'::date) ), cleaned as ( select distinct user_id, task from data where is_completed = 1 ), combinations_2 as ( select a.user_id, a.task || '_' || b.task as combi from cleaned as a join cleaned as b on a.user_id = b.user_id and a.task < b.task ), combinations_3 as ( select a.user_id, a.task || '_' || b.task || '_' || c.task as combi from cleaned as a join cleaned as b on a.user_id = b.user_id and a.task < b.task join cleaned as c on a.user_id = c.user_id and b.task < c.task ), merged as ( select * from combinations_2 union all select * from combinations_3 ) select combi, count(user_id) from merged group by 1 order by 1;

如何利用单列历史SQLsnowflake获得合并结果

推荐答案

Sql相关问答推荐

带有双引号的json在Presto中是否有区别对待？

如何在T—SQL中找到值更改之前的日期？

出现5次后，将所有正斜杠替换为连字符

关于Postgres横向联接的谓词

PostgreSQL-按距离阈值挤压相邻行的性能

使用同一个表，为什么IN、NOT IN、NOT EXISTS和EXISTS有不同的输出？

SQL Server - 复杂场景 - 比较状态并填充值到后续行

在SQL中实现表格数据透视类型报表

SQL Server：将JSON对象数组转换为表格格式

带有数组输入参数的Snowflake UDF优化

Teradata 多个进程的最大进程结束时间捕获

为 sqlite 全文搜索 (fts) 创建触发器时出现虚拟表的不安全使用

雅典娜弄错了操作顺序

Django only() 和 values() 不适用于 prefetch_related()

基于源表的 SQL INSERT、UPDATE 和 DELETE

在 MySql 数据库中的两个日期之间搜索

连接表时避免重复

创建一个将层次 struct 级别放入列中的查询

SQL Group By 然后映射出是否存在值

删除具有相同 ID 的重复记录 - Postgresql