我有一个订阅表,有4个字段:ID、Customer_id、Start_Date和End_Date. 它列出了我的客户的所有订阅.没有订阅的end_date为空,一个客户可以同时拥有多个订阅. 例如,客户ID 37可以具有以下订阅:

id  customer_id start_at    end_at
44  37  2019-03-21  2019-03-21
17819   37  2020-03-23  2020-03-23
22302   37  2020-04-24  2021-07-25
42213   37  2021-04-25  2023-04-26
92013   37  2023-04-26  2024-04-26

这些记录意味着客户37是2019-03-21的订户,然后是2020-03-23的订户,然后是2020-04-24到2024-04-26的订户,总计1463天.

我正在try 编写一个查询,以获取每个客户在给定时间段内订阅的天数. 客户37在2023年已成为订户365天. 订阅可以重叠,因为一个订阅服务器可以同时拥有多个订阅.

查询的结果应该类似于:

customer_id total_subscription_days
37  1463
38  526
39  426
40  365
41  325

我的数据库运行在MySQL 8.2.12上.

我try 使用滞后,铅,CTE,最小和最大,无济于事.我试过chatgpt和stackoverflow.

编辑:以下是我到目前为止try 的内容:

第一次try :

SELECT 
    customer_id,
    SUM(DATEDIFF(
        LEAST(end_at, '2023-12-31'), 
        GREATEST(start_at, '2023-01-01')
    ) + 1) AS total_subscription_days
FROM (
    SELECT 
        customer_id,
        start_at,
        end_at
    FROM 
        subscription
    WHERE 
        start_at <= '2023-12-31' AND end_at >= '2023-01-01'
    UNION ALL
    SELECT 
        s1.customer_id,
        LEAD(s1.end_at) OVER (PARTITION BY s1.customer_id ORDER BY s1.end_at),
        '2023-12-31'
    FROM 
        subscription s1
    LEFT JOIN 
        subscription s2 ON s1.customer_id = s2.customer_id 
                       AND s1.end_at < s2.start_at
    WHERE 
        s2.start_at IS NOT NULL
) AS merged_subscriptions
GROUP BY 
    customer_id;

尽管我想知道2023年的订阅天数,但我得到的结果超过了365天.因此,由于联接,它似乎会计算重复项.

第二次try :

WITH subscription_periods AS (
    SELECT 
        customer_id,
        start_at,
        end_at,
        ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY start_at) AS period_number
    FROM 
        subscription
    WHERE 
        start_at <= '2023-12-31' AND end_at >= '2023-01-01' AND customer_id < 100
),
subscription_days AS (
    SELECT 
        customer_id,
        SUM(
            DATEDIFF(
                LEAST(end_at, '2023-12-31'), 
                GREATEST(start_at, '2023-01-01')
            ) + 1
        ) AS days
    FROM 
        (
            SELECT 
                customer_id,
                start_at,
                LEAD(end_at) OVER (PARTITION BY customer_id ORDER BY start_at) AS end_at
            FROM 
                subscription_periods
        ) AS overlapping_periods
    WHERE 
        end_at >= '2023-01-01'
    GROUP BY 
        customer_id
)
SELECT 
    customer_id,
    SUM(days) AS total_subscription_days
FROM 
    subscription_days
GROUP BY 
    customer_id;

我仅限于前100个客户,否则我会收到504错误. 这个查询似乎没有考虑到订阅之间的差距. 对于订阅了从2023-01-01到2023-04-01,然后从2023-05-01到2023-08-01的客户,似乎是在2023-01-01和2023-08-01之间计算天数.

推荐答案

这是一种合并重叠间隔的问题:

  • 查找与给定范围相交的所有行,例如2023-01-01到2023-12-31
  • Process rows ordered by start date and assign group numbers as follows:
    If there is a gap between previous group and current row then start a new group
  • 按关键字和组号对结果进行分组,夹住超出范围的日期,并计算每个组的最小日期和最大日期之间的差异
set @date1 = '2023-01-01';
set @date2 = '2023-12-31';

with cte1 as (
    select customer_id, start_at, end_at
    from t
    where start_at <= @date2 and end_at >= @date1
), cte2 as (
    select *, case when start_at <= max(end_at) over (partition by customer_id order by start_at rows between unbounded preceding and 1 preceding) then 0 else 1 end as newgrp
    from cte1
), cte3 as (
    select *, sum(newgrp) over (partition by customer_id order by start_at) as grpnum
    from cte2
)
select
    customer_id,
    greatest(min(start_at), @date1) as date1,
    least(max(end_at), @date2) as date2,
    datediff(
        least(max(end_at), @date2),
        greatest(min(start_at), @date1)
    ) + 1 as diff
from cte3
group by customer_id, grpnum

Demo on DB<>Fiddle

Mysql相关问答推荐

SQL Store Procedure Throwing [42000][1064]您在EXECUTE stat USING声明上的SQL语法中有错误

如何计算超过特定数字的所有不同ID组,然后返回这些ID?

如何在MySQL中检索两列的值不同的行

为什么我的带有索引字段和非索引字段的 Select 有时需要很长时间

递归查询意外地为一个字段返回空值

仅获取我不是最后一个 comments 者的博客帖子

根据时间戳分组删除

MySQL - 密码哈希没有预期的格式

获取每个参数的记录,不重复

根据 MySql 中同一表中的多列 Select 多列

Select 不同的列,其中另一列不包含特定值

从 mysql RDS 导出数据以导入 questDb

mysql数据库数据文件夹中的DESKTOP-7JFP5MF-bin.000001文件有什么用?

MySQL Join 查询位置从订单

多个驱动器上的 MYSQL 数据

查询mysql中无法识别数据值

使用 ON DUPLICATE KEY 将列增加一定数量 MySQL NodeJS

mysqli_fetch_array() 期望参数 1 为 mysqli_result,布尔值

在 MySQL 中删除数据库返回删除数据库错误:66

int(11) 和 int(11) UNSIGNED 有什么区别?