在子窗口SQL Presto中使用特定条件执行值计数

发布于10月24日

假设我有一个包含3列的表:ID、DATE_TIME、COLOR.数据如下所示:

id, date_time, color
1, 2023-10-01 12:15, green
1, 2023-10-01 12:16, yellow
1, 2023-10-01 12:17, yellow
1, 2023-10-01 12:18, red
1, 2023-10-01 12:19, yellow
1, 2023-10-01 12:20, yellow
1, 2023-10-01 12:21, red
1, 2023-10-01 12:22, red
1, 2023-10-01 12:23, green
1, 2023-10-01 12:24, yellow
1, 2023-10-01 12:25, yellow
1, 2023-10-01 12:26, red
2, 2023-10-01 12:27, red
2, 2023-10-01 12:28, green
2, 2023-10-01 12:29, green
2, 2023-10-01 12:30, yellow

我需要计算"黄色"值在"COLOR"列中出现的频率，按"id"列分组，按DATE_TIME排序.然而，我有具体的条件:

如果"黄色"出现在"绿色"之后，我想把它算作"黄色".
只有当它后面跟第一个"red"，或者它是由"id"定义的组中的最后一个值时，我才想将"Huang"计算在内.

它看起来像一个组中的子窗口.

我在AWS Athena中使用SQL Presto，我认为我应该使用窗口函数，但我不确定如何指定这些条件.

事先感谢你的提示

因此，预期结果应该是:

For id=1: Count "yellow" = 4
For id=2: Count "yellow" = 1

我试过了，但没有一个在条件下重复黄色的累积计数器.

`with testdata(id, date_time, color) as (
  VALUES 
  (1, cast('2023-10-01 12:15:00' as timestamp), 'green'),
  (1, cast('2023-10-01 12:16:00' as timestamp), 'yellow'),
  (1, cast('2023-10-01 12:17:00' as timestamp), 'yellow'),
  (1, cast('2023-10-01 12:18:00' as timestamp), 'red'),
  (1, cast('2023-10-01 12:19:00' as timestamp), 'yellow'),
  (1, cast('2023-10-01 12:20:00' as timestamp), 'yellow'),
  (1, cast('2023-10-01 12:21:00' as timestamp), 'red'),
  (1, cast('2023-10-01 12:22:00' as timestamp), 'red'),
  (1, cast('2023-10-01 12:23:00' as timestamp), 'green'),
  (1, cast('2023-10-01 12:24:00' as timestamp), 'yellow'),
  (1, cast('2023-10-01 12:25:00' as timestamp), 'yellow'),
  (1, cast('2023-10-01 12:26:00' as timestamp), 'red'),
  (2, cast('2023-10-01 12:27:00' as timestamp), 'red'),
  (2, cast('2023-10-01 12:28:00' as timestamp), 'green'),
  (2, cast('2023-10-01 12:29:00' as timestamp), 'green'),
  (2, cast('2023-10-01 12:30:00' as timestamp), 'yellow')
)
,t1 as (
  SELECT id,
    date_time,
    color,
    LAG(color) OVER (
      PARTITION BY id
      ORDER BY date_time
    ) AS prev_color,
    LEAD(color) OVER (
      PARTITION BY id
      ORDER BY date_time
    ) AS next_color
  FROM testdata
)
select id,
  SUM(
    CASE
      WHEN color = 'yellow'
      AND (
        prev_color = 'green'
        and (
          next_color IS NULL
          OR next_color = 'red'
          OR next_color = 'yellow'
        )
      ) THEN 1 ELSE 0
    END
  )  AS yellow_count
FROM t1
group by id`

我有自己的价值观

对于id=1:计数"黄色"=2(不正确)
对于id=2:计数"黄色"=1(正确)

with data (id, date_time, color) as ( VALUES (1, cast('2023-10-01 12:15:00' as timestamp), 'green'),(1, cast('2023-10-01 12:16:00' as timestamp), 'yellow'),(1, cast('2023-10-01 12:17:00' as timestamp), 'yellow'),(1, cast('2023-10-01 12:18:00' as timestamp), 'red'),(1, cast('2023-10-01 12:19:00' as timestamp), 'yellow'),(1, cast('2023-10-01 12:20:00' as timestamp), 'yellow'),(1, cast('2023-10-01 12:21:00' as timestamp), 'red'),(1, cast('2023-10-01 12:22:00' as timestamp), 'red'),(1, cast('2023-10-01 12:23:00' as timestamp), 'green'),(1, cast('2023-10-01 12:24:00' as timestamp), 'yellow'),(1, cast('2023-10-01 12:25:00' as timestamp), 'yellow'),(1, cast('2023-10-01 12:26:00' as timestamp), 'red'),(2, cast('2023-10-01 12:27:00' as timestamp), 'red'),(2, cast('2023-10-01 12:28:00' as timestamp), 'green'),(2, cast('2023-10-01 12:29:00' as timestamp), 'green'),(2, cast('2023-10-01 12:30:00' as timestamp), 'yellow') ), cte AS ( SELECT id, date_time, color, ROW_NUMBER() OVER (PARTITION BY id ORDER BY date_time) - ROW_NUMBER() OVER (PARTITION BY id, color ORDER BY date_time) AS rn FROM data ), cte2 AS ( SELECT id, rn, MAX(color) AS color, MAX(date_time) AS date_time, SUM(CASE WHEN color = 'yellow' THEN 1 END) AS total FROM cte c1 GROUP BY id, rn ORDER BY date_time ), cte3 AS ( SELECT *, LAG(color) OVER ( PARTITION BY id ORDER BY date_time) AS prev_color, LEAD(color) OVER ( PARTITION BY id ORDER BY date_time) AS next_color FROM cte2 ) SELECT id, SUM(total) AS total FROM cte3 WHERE color = 'yellow' AND prev_color = 'green' AND ( next_color = 'red' OR next_color is null) GROUP BY id

ID	总
1	4
2	1

总

在子窗口SQL Presto中使用特定条件执行值计数

推荐答案

Sql相关问答推荐

使用自动增量ID插入失败(无法将值空插入列ID)

连接特定行号

Android房间fts4匹配语法AND OR

从日期开始向前填充重复项

如何利用单列历史SQLsnowflake获得合并结果

查询页面推荐

我可以在SQLite3中使用BLOB作为主键吗？

IF NOT EXISTS子查询的性能瓶颈

明细表中没有记录如何更新主表的值为0

在WHERE EXISTS子查询中，列返回是否重要？

从给定数据中查找下一个工作日期

按行值出现的顺序对行值进行分组

SAS proc freq 或 proc sql 获取数据子集和整个数据的频率

按公司和产品查询最近发票的平均价格的SQL查询

使用对 nvarchar 列的多个 LIKE 操作优化 SQL 查询

Select 字段，除非另一个字段包含重复项

如何防止 SQL 中的负收入值并将其重新分配到接下来的月份？

SELECT 用于 Parent、Children 和 ORDER BY [Order] 列

交叉应用 OPENJSON / PIVOT - 错误的顺序

BigQuery 错误：SELECT 列表表达式引用 esthetician.LICENSE_TYPE，它既未在 [49：8] 分组也未聚合