问题描述
我有一个表(#tmstmp
),有两列dt
(DATETIME
)和payload
(INT
).最后,我想把每5分钟的间隔加起来payload
.
代码
设置
DECLARE @start DATETIME = N'2024-1-1 12:00:00';
DROP TABLE IF EXISTS #tmstmp
, #numbers;
CREATE TABLE #tmstmp (
dt DATETIME PRIMARY KEY
, payload INT NOT NULL
);
CREATE TABLE #numbers (
n INT PRIMARY KEY
);
WITH numbers (n) AS (
SELECT 0 AS n
UNION ALL
SELECT n + 1 AS n
FROM numbers
WHERE n < 100
)
INSERT
INTO #numbers
SELECT n
FROM numbers;
WITH rnd (mins, secs) AS (
SELECT n2.n AS mins
, CAST(ABS(CHECKSUM(NEWID())) % 60 AS INT) AS mins
FROM #numbers AS n1
, #numbers as n2
WHERE n1.n < 5
AND n2.n < 15
), tmstmp (dt) AS (
SELECT DATEADD(SECOND, secs, DATEADD(MINUTE, mins, @start)) AS dt
FROM rnd
)
INSERT
INTO #tmstmp
SELECT DISTINCT dt
, -1 AS payload
FROM tmstmp
ORDER BY dt;
UPDATE #tmstmp
SET payload = CAST(ABS(CHECKSUM(NEWID())) % 10 AS INT);
GO
不重叠的时隙很容易
DECLARE @start DATETIME = N'2024-1-1 12:00:00';
DECLARE @slotDuration INT = 5;
WITH agg (slot, sum_payload) AS (
SELECT DATEDIFF(MINUTE, @start, dt) / @slotDuration AS slot
, SUM(payload) AS sum_payload
FROM #tmstmp
GROUP BY DATEDIFF(MINUTE, @start, dt) / @slotDuration
)
SELECT DATEADD(MINUTE, slot * @slotDuration, @start) AS [from]
, DATEADD(MINUTE, (slot + 1) * @slotDuration, @start) AS [to]
, sum_payload
FROM agg;
from | to | sum_payload |
---|---|---|
2024-01-01 12:00:00 | 2024-01-01 12:05:00 | 124 |
2024-01-01 12:05:00 | 2024-01-01 12:10:00 | 106 |
2024-01-01 12:10:00 | 2024-01-01 12:15:00 | 95 |
终极目标:获得 run 时隙
然而,我希望在范围内有一个each间隔的条目,即从12:00-12:05
,12:01-12:06
,12:02-12:07
等直到最后一个时隙.
我可以在前面的整个范围内构造极限,并在JOIN
中使用它,如下所示:
DECLARE @start DATETIME = N'2024-1-1 12:00:00';
DECLARE @slotDuration INT = 5;
DECLARE @intervals INT = (SELECT DATEDIFF(MINUTE, @start, MAX(dt)) FROM #tmstmp);
WITH ranges ([from], [to], slot) AS (
SELECT DATEADD(MINUTE, n, @start) AS [from]
, DATEADD(MINUTE, n + @slotDuration, @start) AS [to]
, n AS slot
FROM #numbers
WHERE n <= @intervals
), tm_mult (slot, [from], [to], dt, payload) AS (
SELECT slot
, [from]
, [to]
, dt
, payload
FROM #tmstmp
INNER JOIN ranges
ON [from] <= dt
AND dt < [to]
)
SELECT MIN([from]) AS [from]
, MAX([to]) AS [to]
, SUM(payload) AS sum_payload
FROM tm_mult
GROUP BY slot
ORDER BY slot;
from | to | sum_payload |
---|---|---|
2024-01-01 12:00:00 | 2024-01-01 12:05:00 | 124 |
2024-01-01 12:01:00 | 2024-01-01 12:06:00 | 120 |
2024-01-01 12:02:00 | 2024-01-01 12:07:00 | 125 |
... | ... | ... |
2024-01-01 12:14:00 | 2024-01-01 12:19:00 | 19 |
虽然这在这个玩具示例中起作用,但我的真实数据中有数十万个时间戳,最糟糕的是,我对指数的影响很小.我的直觉告诉我,我会用我的不等式JOIN
创建相当多的重复,我想知道这是否是无论如何做它的规范方法,或者是否有一个更多的SQL-onic
方法做它?(就像pythonistas
喜欢调用某些代码pythonic
,如果它使用语言固有的概念,而不是试图用通用工具解决它).