在Postgres中,给定如下日期范围的集合:

meter_id device_id start_at end_at
meter1 device1 2020-01-02 10:30 2025-01-02 14:00
meter2 device1 2020-01-02 10:30 2020-01-02 11:30
meter3 device1 2020-01-02 10:30 2020-01-03 11:30

我想把这些范围分成:

  • 全天的范围
  • 整天前的小时/分钟范围
  • 全天后的小时/分钟范围
  • 如果这个范围已经可以放在一整天的桶里,那么就别管它了

因此,上面的结果将是:

meter_id device_id start_at end_at remark
meter1 device1 2020-01-02 10:30 2020-01-03 00:00 hours at start of 1st row
meter1 device1 2020-01-03 00:00 2025-01-02 00:00 whole days from 1st row
meter1 device1 2025-01-02 00:00 2025-01-02 14:00 hours at end of 1st row
meter2 device1 2020-01-02 10:30 2020-01-02 11:30 2nd row left alone
meter3 device1 2020-01-02 10:30 2020-01-03 00:00 hours at start of 3rd row
meter3 device1 2020-01-03 00:00 2020-01-03 11:30 hours at end of 3rd row

我已经写了一些有用的东西,但它相当复杂和丑陋.

Is there a simpler way of doing this?

表和数据:

CREATE TABLE IF NOT EXISTS metering_ranges (
    metering_point_id text NOT NULL,
    device_id text NOT NULL,
    start_at timestamp with time zone NOT NULL,
    end_at timestamp with time zone NOT NULL
);

INSERT INTO metering_ranges( metering_point_id, device_id,  start_at, end_at)
VALUES 
    ('meter1', 'device1', '2020-01-02 10:30:00', '2025-01-02 14:00:00'),
    ('meter2', 'device1', '2020-01-02 10:30:00', '2020-01-02 11:30:00'),
    ('meter3', 'device1', '2020-01-02 10:30:00', '2020-01-03 11:30:00');

现有(复杂)解决方案

with 

ranges_with_whole_days as (
    SELECT
        metering_point_id,
        device_id,
        start_at,
        date_trunc('day', start_at) + interval '1 d' as start_at_next_whole_day,
        date_trunc('day', end_at) as end_at_whole_day,
        end_at
    FROM
        metering_ranges
),

ranges as (
    SELECT
        metering_point_id,
        device_id,
        start_at,
        CASE
            WHEN start_at_next_whole_day <= end_at_whole_day THEN start_at_next_whole_day ELSE NULL
        END as start_at_next_day,
        CASE 
            WHEN end_at_whole_day >= start_at_next_whole_day THEN end_at_whole_day ELSE NULL
        END as end_at_prev_day,
        end_at
    FROM
        ranges_with_whole_days
),

ranges_bucketed AS (
    -- get hours before whole day
    SELECT metering_point_id, device_id, start_at, start_at_next_day as end_at
    FROM ranges m
    WHERE start_at_next_day IS NOT NULL
    
    UNION
    
    -- get whole day period
    SELECT metering_point_id, device_id, start_at_next_day as start_at, end_at_prev_day as end_at
    FROM  ranges m
    WHERE start_at_next_day IS NOT NULL AND end_at_prev_day IS NOT NULL AND start_at_next_day != end_at_prev_day
    
    UNION
    
    -- get hours after whole day
    SELECT metering_point_id, device_id, end_at_prev_day as start_at, end_at
    FROM ranges m
    WHERE end_at_prev_day IS NOT NULL
    
    UNION
    
    -- get existing record if it fits within a day
    SELECT  metering_point_id,  device_id, start_at, end_at
    FROM ranges m
    WHERE start_at_next_day IS NULL AND end_at_prev_day IS NULL
) 

SELECT *
FROM ranges_bucketed
ORDER BY metering_point_id, device_id, start_at

推荐答案

假设数据类型为timestamp(使用样例数据,而不是显示timestamptz的相互矛盾的表定义).否则,您必须执行更多操作.请参见:

1.在午夜拆分所有范围,保持合并全天

正如你的 comments 所表明的那样.(此外,与您的‘meter3’样本行一致.)

WITH cte AS (
   SELECT *
        , CASE WHEN start_at::time = '0:0' THEN start_at
               ELSE date_trunc('day', start_at) + interval '1 day' END AS start_day
        , date_trunc('day', end_at) AS end_day
   FROM   metering_ranges
   )
-- nothing to split
SELECT metering_point_id, device_id, start_at, end_at
FROM   cte
WHERE  end_day < start_day

-- split core days
UNION ALL
SELECT metering_point_id, device_id, start_day, end_day
FROM   cte
WHERE  end_day > start_day
-- add leading range
UNION ALL
SELECT metering_point_id, device_id, start_at, start_day
FROM   cte
WHERE  end_day >= start_day
AND    start_at::time > '0:0'
-- add trailing range
UNION ALL
SELECT metering_point_id, device_id, end_day, end_at
FROM   cte
WHERE  end_day >= start_day
AND    end_at::time > '0:0'
ORDER  BY 1,2,3;

fiddle个个

Use UNION ALL, not UNION. Faster.
Also, you can append ORDER BY to a UNION query once at the end. That applies to the whole result set.

2.只列出全天的范围

我最初的解释是.(也许是更常见的用例?)

WITH cte AS (
   SELECT *
        , CASE WHEN start_at::time = '0:0' THEN start_at
               ELSE date_trunc('day', start_at) + interval '1 day' END AS start_day
        , date_trunc('day', end_at) AS end_day
   FROM   metering_ranges
   )
-- nothing to split
SELECT metering_point_id, device_id, start_at, end_at
FROM   cte
WHERE  end_day <= start_day
-- split core days
UNION ALL
SELECT metering_point_id, device_id, start_day, end_day
FROM   cte
WHERE  end_day > start_day
-- add leading range
UNION ALL
SELECT metering_point_id, device_id, start_at, start_day
FROM   cte
WHERE  end_day > start_day
AND    start_at::time <> '0:0'
-- add trailing range
UNION ALL
SELECT metering_point_id, device_id, end_day, end_at
FROM   cte
WHERE  end_day > start_day
AND    end_at::time <> '0:0'
ORDER  BY 1,2,3;

没有嵌套完整天数的only difference:范围仍在午夜拆分.即从00:00开始到第二天24:00之前结束的范围.小提琴证明了这一点.

相关:

Variants for case 2. with multirange type

对于Case 1.,这些方法不会立即起作用,因为只有在提取全天的情况下,范围才会被分割.不适合‘meter3’的样本行.

需要Postgres 14或更高版本.我们可以使用新的anymultirange - anymultirangeanymultirange操作员.The manual:

计算多范围的差值.

然后我们可以对得到的多范围进行unnest(),以得到剩余的分数.

WITH cte AS (
   SELECT *
        , CASE WHEN start_at::time = '0:0' THEN start_at
               ELSE date_trunc('day', start_at) + interval '1 day' END AS start_day
        , date_trunc('day', end_at) AS end_day
   FROM   metering_ranges
   )
-- nothing to split
SELECT metering_point_id, device_id, start_at, end_at
FROM   cte
WHERE  end_day <= start_day

-- nested full days to split
UNION ALL
SELECT metering_point_id, device_id, x.*
FROM   cte
CROSS  JOIN LATERAL (
   VALUES (start_day, end_day)
   UNION ALL
   SELECT lower(rest), upper(rest)
   FROM   unnest(tsrange(start_at, end_at)::tsmultirange
               - tsrange(start_day, end_day)::tsmultirange) rest
   ) x
WHERE  end_day > start_day
ORDER  BY 1,2,3;

或者更短,没有CTE,有LEFT JOIN LATERAL分:

SELECT c.metering_point_id, c.device_id
     , COALESCE(x.start_at, c.start_at) AS start_at
     , COALESCE(x.end_at, c.end_at) AS end_at
FROM  (
   SELECT *
        , CASE WHEN start_at::time = '0:0' THEN start_at
               ELSE date_trunc('day', start_at) + interval '1 day' END AS start_day
        , date_trunc('day', end_at) AS end_day
   FROM   metering_ranges
   ) c
LEFT   JOIN  LATERAL (
   VALUES (start_day, end_day)
   UNION ALL
   SELECT lower(rest), upper(rest)
   FROM   unnest(tsrange(start_at, end_at)::tsmultirange
               - tsrange(start_day, end_day)::tsmultirange) rest
   WHERE  c.end_day > c.start_day
   ) x(start_at, end_at) ON c.end_day > c.start_day
ORDER  BY 1,2,3;

fiddle个个

但是,构建、计算和解嵌多范围会增加相当大的开销.我不确定这种复杂程度是否能与更"暴力"的查询相抗衡.

Sql相关问答推荐

提取Snowflake SQL中的嵌套键

将SEMI JOIN、ANTI JOIN转换为非连接SQL

在SQL中将相同且紧挨着的元素进行分组

Postgres:对包含数字的字符串列表进行排序

使用WHERE子句进行筛选时,SQL SELECT查询返回总计数

从自定义日期和时间开始,每月具有给定状态的公司数量

我怎样才能得到列值对应的最大值在另一个?

我需要一个regexp_like来只验证字母D或T、数字和管道

更改重复值的情况

将 jsonb 数组中的对象取消嵌套到单独的行中

试图找到两个身份列表的交集(列表的长度不同),但获取列 id 不明确?

SQL 搜索 - 获取最大值日期的奇怪行为

如何使用 Google BigQuery 中的条件根据特定列值连接列的 N 行?

MariaDB非常简单的MATCHAGAINST查询不使用FULLTEXT索引吗?

SQL 函数 DIFFERENCE 返回有趣的分数

编写查询以根据级别 (p2) 返回父位置

面对来自以下两个代码的不同输出

有条件求和

如何优化sql请求?

在 AWS athena 的视图之上创建视图时,如何消除此错误:列别名列表有 1 个条目但t有 4 列可用?