假设我有一个名为t
的表,其中有两列foo
和bar
.
foo | bar |
---|---|
1 | 11 |
1 | 11 |
2 | 11 |
2 | 11 |
2 | 11 |
3 | 11 |
3 | 12 |
3 | 12 |
现在我想分别计算foo
和bar
的不同值的出现次数,并将它们聚合到ARRAY<MAP<BIGINT, BIGINT>>
中.
在本例中:
-
foo == 1
例出现2次; -
foo = 2
例出现3次; -
foo = 3
例出现3次; -
bar == 11
次出现6次; -
bar == 12
出现2次.
因此,结果表应该如下所示:
name | cnt |
---|---|
"foo" | [{1:2}, {2:3}, {3:3}] |
"bar" | [{11:6}, {12:2}] |
我目前的做法有点像这样:
WITH t_foo AS (
SELECT
"foo" AS name,
COLLECT_LIST(MAP(val, cnt)) AS cnt
FROM (
SELECT
foo AS val,
COUNT(*) AS cnt
FROM
t
GROUP BY
foo
) AS tt
),
t_bar AS (
SELECT
"bar" AS name,
COLLECT_LIST(MAP(val, cnt)) AS cnt
FROM (
SELECT
bar AS val,
COUNT(*) AS cnt
FROM
t
GROUP BY
bar
) AS tt
)
SELECT * FROM t_foo
UNION ALL
SELECT * FROM t_bar
这是可行的,但似乎是重复性的.事实上,我不仅有foo
个和bar
个专栏要处理,还有十几个其他专栏要处理.有没有更聪明的方法来解决这个问题?