您可以try 使用combinations
函数,如果它在Athena中可用(它存在于Athena所基于的Trino中):
WITH dataset (Client, Product) AS (
values (1, 'A'),
(1, 'B'),
(1, 'C'),
(2, 'A'),
(2, 'B'),
(2, 'D'),
(3, 'A'),
(3, 'B')
)
select p.comb, count(Client) count
from (select comb
from (select array_distinct(array_agg(Product)) arr
from dataset) p,
unnest(combinations(arr, 2) || combinations(arr, 3)) as t(comb)) p -- build all product combinations
left join (select client, comb
from (select client, array_distinct(array_agg(Product)) as arr
from dataset
group by client),
unnest(combinations(arr, 2) || combinations(arr, 3)) as t(comb)) cp -- build all product combinations for concrete client
on cp.comb = p.comb
group by p.comb
order by cardinality(p.comb), p.comb;
输出:
comb |
count |
[A, B] |
3 |
[A, C] |
1 |
[A, D] |
1 |
[B, C] |
1 |
[B, D] |
1 |
[C, D] |
0 |
[A, B, C] |
1 |
[A, B, D] |
1 |
[A, C, D] |
0 |
[B, C, D] |
0 |
请注意,组合在输出中仅限于100000条记录,因此如果产品组合的实际数量较高,这将不起作用(另外,无论哪种方式,对于任何数量相对较大的产品,这都将是相当昂贵的查询).