假设我有一个包含3列的表:ID、DATE_TIME、COLOR.数据如下所示:
id, date_time, color
1, 2023-10-01 12:15, green
1, 2023-10-01 12:16, yellow
1, 2023-10-01 12:17, yellow
1, 2023-10-01 12:18, red
1, 2023-10-01 12:19, yellow
1, 2023-10-01 12:20, yellow
1, 2023-10-01 12:21, red
1, 2023-10-01 12:22, red
1, 2023-10-01 12:23, green
1, 2023-10-01 12:24, yellow
1, 2023-10-01 12:25, yellow
1, 2023-10-01 12:26, red
2, 2023-10-01 12:27, red
2, 2023-10-01 12:28, green
2, 2023-10-01 12:29, green
2, 2023-10-01 12:30, yellow
我需要计算"黄色"值在"COLOR"列中出现的频率,按"id"列分组,按DATE_TIME排序.然而,我有具体的条件:
- 如果"黄色"出现在"绿色"之后,我想把它算作"黄色".
- 只有当它后面跟第一个"red",或者它是由"id"定义的组中的最后一个值时,我才想将"Huang"计算在内.
它看起来像一个组中的子窗口.
我在AWS Athena中使用SQL Presto,我认为我应该使用窗口函数,但我不确定如何指定这些条件.
事先感谢你的提示
因此,预期结果应该是:
For id=1: Count "yellow" = 4
For id=2: Count "yellow" = 1
我试过了,但没有一个在条件下重复黄色的累积计数器.
`with testdata(id, date_time, color) as (
VALUES
(1, cast('2023-10-01 12:15:00' as timestamp), 'green'),
(1, cast('2023-10-01 12:16:00' as timestamp), 'yellow'),
(1, cast('2023-10-01 12:17:00' as timestamp), 'yellow'),
(1, cast('2023-10-01 12:18:00' as timestamp), 'red'),
(1, cast('2023-10-01 12:19:00' as timestamp), 'yellow'),
(1, cast('2023-10-01 12:20:00' as timestamp), 'yellow'),
(1, cast('2023-10-01 12:21:00' as timestamp), 'red'),
(1, cast('2023-10-01 12:22:00' as timestamp), 'red'),
(1, cast('2023-10-01 12:23:00' as timestamp), 'green'),
(1, cast('2023-10-01 12:24:00' as timestamp), 'yellow'),
(1, cast('2023-10-01 12:25:00' as timestamp), 'yellow'),
(1, cast('2023-10-01 12:26:00' as timestamp), 'red'),
(2, cast('2023-10-01 12:27:00' as timestamp), 'red'),
(2, cast('2023-10-01 12:28:00' as timestamp), 'green'),
(2, cast('2023-10-01 12:29:00' as timestamp), 'green'),
(2, cast('2023-10-01 12:30:00' as timestamp), 'yellow')
)
,t1 as (
SELECT id,
date_time,
color,
LAG(color) OVER (
PARTITION BY id
ORDER BY date_time
) AS prev_color,
LEAD(color) OVER (
PARTITION BY id
ORDER BY date_time
) AS next_color
FROM testdata
)
select id,
SUM(
CASE
WHEN color = 'yellow'
AND (
prev_color = 'green'
and (
next_color IS NULL
OR next_color = 'red'
OR next_color = 'yellow'
)
) THEN 1 ELSE 0
END
) AS yellow_count
FROM t1
group by id`
我有自己的价值观
- 对于id=1:计数"黄色"=2(不正确)
- 对于id=2:计数"黄色"=1(正确)