幸运的是,您正在使用PostgreSQL.窗口功能generate_series()
是您的朋友.
测试用例
给出以下测试表(you应提供):
CREATE TABLE event(event_id serial, ts timestamp);
INSERT INTO event (ts)
SELECT generate_series(timestamp '2018-05-01'
, timestamp '2018-05-08'
, interval '7 min') + random() * interval '7 min';
One event for every 7 minutes (plus 0 to 7 minutes, randomly).
碱性溶液
此查询统计任意时间间隔内的事件.在本例中为17分钟:
WITH grid AS (
SELECT start_time
, lead(start_time, 1, 'infinity') OVER (ORDER BY start_time) AS end_time
FROM (
SELECT generate_series(min(ts), max(ts), interval '17 min') AS start_time
FROM event
) sub
)
SELECT start_time, count(e.ts) AS events
FROM grid g
LEFT JOIN event e ON e.ts >= g.start_time
AND e.ts < g.end_time
GROUP BY start_time
ORDER BY start_time;
该查询从基表中检索最小值和最大值ts
,以覆盖整个时间范围.您可以使用任意的时间范围.
根据需要提供any time interval个.
为every个时隙生成一行.如果在这段时间内没有发生任何事件,则计数为0
.
确保正确处理upper and lower bound.见:
窗口函数lead()
有一个经常被忽略的特性:当不存在前导行时,它可以提供默认值.在示例中提供'infinity'
.否则,最后一个间隔将被上限NULL
截断.
最小等价
上面的查询使用CTE和lead()
以及详细的语法.优雅,也许更容易理解,但有点贵.以下是一个更短、更快、最小的版本:
SELECT start_time, count(e.ts) AS events
FROM (SELECT generate_series(min(ts), max(ts), interval '17 min') FROM event) g(start_time)
LEFT JOIN event e ON e.ts >= g.start_time
AND e.ts < g.start_time + interval '17 min'
GROUP BY 1
ORDER BY 1;
Example for "every 15 minutes in the past week"`
格式为to_char()
.
SELECT to_char(start_time, 'YYYY-MM-DD HH24:MI'), count(e.ts) AS events
FROM generate_series(date_trunc('day', localtimestamp - interval '7 days')
, localtimestamp
, interval '15 min') g(start_time)
LEFT JOIN event e ON e.ts >= g.start_time
AND e.ts < g.start_time + interval '15 min'
GROUP BY start_time
ORDER BY start_time;
在底层时间戳value上仍然是ORDER BY
和GROUP BY
,而不是在格式化字符串上.这样更快更可靠.
db<>fiddle 100
相关答案在时间范围内得出running count分: