问题描述

我有一个表(#tmstmp),有两列dt(DATETIME)和payload(INT).最后,我想把每5分钟的间隔加起来payload.

代码

设置

DECLARE @start DATETIME = N'2024-1-1 12:00:00';
DROP TABLE IF EXISTS #tmstmp
                     , #numbers;
CREATE TABLE #tmstmp (
  dt DATETIME PRIMARY KEY
  , payload INT NOT NULL
);

CREATE TABLE #numbers (
  n INT PRIMARY KEY
);
WITH numbers (n) AS (
  SELECT 0 AS n
  UNION ALL
  SELECT n + 1 AS n
    FROM numbers
   WHERE n < 100
)
INSERT
  INTO #numbers
SELECT n
  FROM numbers;

WITH rnd (mins, secs) AS (
  SELECT n2.n AS mins
         , CAST(ABS(CHECKSUM(NEWID())) % 60 AS INT) AS mins
   FROM #numbers AS n1
        , #numbers as n2
  WHERE n1.n < 5
    AND n2.n < 15
), tmstmp (dt) AS (
  SELECT DATEADD(SECOND, secs, DATEADD(MINUTE, mins, @start)) AS dt
    FROM rnd
) 
INSERT  
  INTO #tmstmp
SELECT DISTINCT dt
       , -1 AS payload
  FROM tmstmp
 ORDER BY dt;

UPDATE #tmstmp
   SET payload = CAST(ABS(CHECKSUM(NEWID())) % 10 AS INT);
GO

不重叠的时隙很容易

DECLARE @start DATETIME = N'2024-1-1 12:00:00';
DECLARE @slotDuration INT = 5;

WITH agg (slot, sum_payload) AS (
  SELECT DATEDIFF(MINUTE, @start, dt) / @slotDuration AS slot
         , SUM(payload) AS sum_payload
    FROM #tmstmp
   GROUP BY DATEDIFF(MINUTE, @start, dt) / @slotDuration
)
SELECT DATEADD(MINUTE, slot * @slotDuration, @start) AS [from]
       , DATEADD(MINUTE, (slot + 1) * @slotDuration, @start) AS [to]
       , sum_payload
  FROM agg;
from to sum_payload
2024-01-01 12:00:00 2024-01-01 12:05:00 124
2024-01-01 12:05:00 2024-01-01 12:10:00 106
2024-01-01 12:10:00 2024-01-01 12:15:00 95

终极目标:获得 run 时隙

然而,我希望在范围内有一个each间隔的条目,即从12:00-12:0512:01-12:0612:02-12:07等直到最后一个时隙.

我可以在前面的整个范围内构造极限,并在JOIN中使用它,如下所示:

DECLARE @start DATETIME = N'2024-1-1 12:00:00';
DECLARE @slotDuration INT = 5;
DECLARE @intervals INT = (SELECT DATEDIFF(MINUTE, @start, MAX(dt)) FROM #tmstmp);

WITH ranges ([from], [to], slot) AS (
  SELECT DATEADD(MINUTE, n, @start) AS [from]
         , DATEADD(MINUTE, n + @slotDuration, @start) AS [to]
         , n AS slot
    FROM #numbers
   WHERE n <= @intervals
), tm_mult (slot, [from], [to], dt, payload) AS (
  SELECT slot
         , [from]
         , [to]
         , dt
         , payload
    FROM #tmstmp
   INNER JOIN ranges
      ON [from] <= dt
     AND dt < [to]
)
SELECT MIN([from]) AS [from]
       , MAX([to]) AS [to]
       , SUM(payload) AS sum_payload
  FROM tm_mult
 GROUP BY slot
 ORDER BY slot;
from to sum_payload
2024-01-01 12:00:00 2024-01-01 12:05:00 124
2024-01-01 12:01:00 2024-01-01 12:06:00 120
2024-01-01 12:02:00 2024-01-01 12:07:00 125
... ... ...
2024-01-01 12:14:00 2024-01-01 12:19:00 19

虽然这在这个玩具示例中起作用,但我的真实数据中有数十万个时间戳,最糟糕的是,我对指数的影响很小.我的直觉告诉我,我会用我的不等式JOIN创建相当多的重复,我想知道这是否是无论如何做它的规范方法,或者是否有一个更多的SQL-onic方法做它?(就像pythonistas喜欢调用某些代码pythonic,如果它使用语言固有的概念,而不是试图用通用工具解决它).

推荐答案

SQL(WINDOW - microsoft.com/OVER - microsoft.com)中的窗口函数是添加到SQL工具带中的一个很好的assets资源 .此外,特别规范;Windows自SQL Server 2005以来一直存在.

下面是一个例子:

SELECT
    [From],
    DATEADD(MINUTE, 1, [To]) [To],
    payload
FROM (
    SELECT
        dt,
        MIN(dt) OVER(ORDER BY dt ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) [From],
        dt [To],
        SUM(payload) OVER(ORDER BY dt ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) payload
    FROM (
        SELECT
            DATEADD(MINUTE, DATEDIFF(MINUTE, 0, dt), 0) dt,
            SUM(payload) payload
        FROM #tmstmp
        GROUP BY DATEADD(MINUTE, DATEDIFF(MINUTE, 0, dt), 0)
    ) q
) q
WHERE DATEDIFF(MINUTE, [From], [To]) > 3

我想请大家注意4 PRECEDINGDATEADD(MINUTE, DATEDIFF(MINUTE, 0, dt), 0).由于后者实际上将Date Time精确到分钟,2024-01-01 12:04:00.000包含到2024-01-01 12:04:59.999,但不包括2024-01-01 12:05:00.000.希望这就是您正在寻找的功能.

这里有fiddle

Sql相关问答推荐

如何将多个 Select 查询从一个表中组合出来

SQL查询每个客户的最新条目

PostgreSQL基于2个COLS的任意组合 Select 唯一行

基于多列比较连接两个表

更新其组的日期字段值小于最大日期减go 天数的记录

是否可以为表中的所有列生成散列值?

如何为该查询编写正确分区依据

在PostgreSQL中汇总连接表中的 case 值

显示十进制列,但尽可能显示为整数

基于变量的条件 WHERE 子句

使用 SQL 计算一年中任意 3 个月期间的总成本

每组使用平均值来填补缺失值的SQL

如何将特定值从 JSON 列中的一个字段移动到 PostgreSQL 中的另一个字段?

如何在 ClickHouse SQL 中使用 CTE 将邻居语句中的数字作为偏移量传递?

字符串从更改到表列和查询中的一行的转换错误

从每行中排除最大元素

如何通过存储过程将 root 的下一个子 node 作为父 node ?

如何在 Oracle 中获取此变量的值?

snowflake插入覆盖行为

为什么这是 AND,OR with NULL 的真值表?