我使用的是MariaDB SQL.我在我的网站上创建一个统计功能,需要获取用户的历史数据汇总在每天的基础上. 为此,我创建了一个历史表,其中只包含发生更改时的行.我还有我的"主"表,其中包含今天的当前数据.

因此,如果这个值不为空,我需要查询从最后一行检索每个user_id和organization_id组合的状态id.

为了举例说明这个数据,这被认为是"当前"数据 (表名=ORGIZATION_USER_LINK):

id user_id organisation_id status_id stopped_reason_id dossier_created
1 3 73 2 NULL 2021-10-29 07:50:21
2 9 1199 4 5 2021-05-19 17:44:07

接下来是我的历史数据,看起来非常相似 (表名= organization_user_link_status_history):

timestamp user_id organisation_id status_id stopped_reason_id
2024-03-11 12:05:30 3 73 1 NULL
2024-03-08 11:15:35 3 73 3 NULL
2024-03-05 13:25:40 3 73 4 3
2024-03-13 02:07:10 9 1199 1 NULL
2024-03-11 02:07:10 9 1199 2 NULL

我希望我的结果包括从今天开始到特定日期的每一天.其中,每一天都具有前一行的值,以防该天没有值.这些值是按DESC排序的,因此"当前"数据始终排在第一位,因为这是今天的数据.

这就是我想要成为的结果:

date user_id organisation_id status_id stopped_reason_id dossier_created
2024-03-14 3 73 2 NULL 2021-10-29
2024-03-14 9 1199 4 5 2021-05-19
2024-03-13 3 73 2 NULL 2021-10-29
2024-03-13 9 1199 1 NULL 2021-05-19
2024-03-12 3 73 2 NULL 2021-10-29
2024-03-12 9 1199 1 NULL 2021-05-19
2024-03-11 3 73 1 NULL 2021-10-29
2024-03-11 9 1199 2 NULL 2021-05-19
2024-03-10 3 73 1 NULL 2021-10-29
2024-03-10 9 1199 2 NULL 2021-05-19
2024-03-09 3 73 1 NULL 2021-10-29
2024-03-09 9 1199 2 NULL 2021-05-19
2024-03-08 3 73 3 NULL 2021-10-29
2024-03-08 9 1199 2 NULL 2021-05-19
2024-03-07 3 73 3 NULL 2021-10-29
2024-03-07 9 1199 2 NULL 2021-05-19
2024-03-06 3 73 3 NULL 2021-10-29
2024-03-06 9 1199 2 NULL 2021-05-19
2024-03-05 3 73 4 3 2021-10-29
2024-03-05 9 1199 2 NULL 2021-05-19
2024-03-04 3 73 4 3 2021-10-29
2024-03-04 9 1199 2 NULL 2021-05-19
2024-03-03 3 73 4 3 2021-10-29
2024-03-03 9 1199 2 NULL 2021-05-19
2024-03-02 3 73 4 3 2021-10-29
2024-03-02 9 1199 2 NULL 2021-05-19
2024-03-01 3 73 4 3 2021-10-29
2024-03-01 9 1199 2 NULL 2021-05-19

这就是我现在的疑问:

WITH RECURSIVE dates (
    DATE
) AS (
    -- SELECT MIN(DATE(created))
    -- FROM organisation
    SELECT DATE('2024-03-01')
    UNION ALL
    SELECT DATE(date) + INTERVAL 1 DAY
    FROM dates
    WHERE DATE(DATE) < (NOW() - INTERVAL 1 DAY)
),
current_history_data_query AS (
    SELECT 
        current_history_data.*
    FROM (
        SELECT
           DATE(timestamp) AS date,
           user_id,
           organisation_id,
           status_id,
           stopped_reason_id,
           dossier_created,
           'history-data' AS src
         FROM (
           SELECT
               oulsh.user_id,
               oulsh.organisation_id,
               oulsh.timestamp,
               oulsh.status_id,
               oulsh.stopped_reason_id,
               oul.dossier_created,
               ROW_NUMBER() OVER (PARTITION BY oulsh.user_id, oulsh.organisation_id, DATE(oulsh.timestamp) ORDER BY oulsh.timestamp DESC) AS row_num
           FROM organisation_user_link_status_history AS oulsh
           INNER JOIN organisation_user_link AS oul ON oulsh.user_id = oul.user_id AND oulsh.organisation_id = oul.organisation_id
         ) AS numbered_rows
         WHERE row_num = 1 AND DATE(timestamp) != DATE(NOW())
        
         UNION ALL
        
         SELECT DATE(NOW()) AS date, oul.user_id, oul.organisation_id, oul.status_id, oul.stopped_reason_id, oul.dossier_created, 'current-data' AS src
         FROM organisation_user_link AS oul
    ) AS current_history_data
    ORDER BY DATE DESC
)
SELECT
    dates.date AS dates_date,
    COALESCE(user_id, LAG(user_id) OVER (ORDER BY dates_date DESC)) AS user_id,
    COALESCE(organisation_id, LAG(organisation_id) OVER (ORDER BY dates_date DESC)) AS organisation_id,
    COALESCE(status_id, LAG(status_id) OVER (ORDER BY dates_date DESC)) AS status_id,
    COALESCE(stopped_reason_id, LAG(stopped_reason_id) OVER (ORDER BY dates_date DESC)) AS stopped_reason_id,
    COALESCE(dossier_created, LAG(dossier_created) OVER (ORDER BY dates_date DESC)) AS dossier_created
FROM dates
LEFT JOIN current_history_data_query AS chdq ON dates.date = chdq.date
GROUP BY DATE(dates.date)
ORDER BY dates.date DESC;

使用此查询,会出现多个问题:

  • 对于原始非空值行之后的第一行:数据正在正确复制.但是,之后,例如第三行仍然是NULL.即使第二行刚刚被GAP()窗口函数填充.
  • 上面提供的代码没有考虑到我还需要按user_id和organization_id分区.如果我在LAG()函数中添加:PARTION BY user_id,organization_id.然后整个LAG功能不再工作,甚至我的第二行都没有得到填充的数据.

我在这里遗漏了什么,我如何解决这个问题?

推荐答案

由于MariaDB不支持LATERAL或CROSS APPLY,您可以在 Select 列表中使用三个独立的相关子查询:

SELECT d.date, u.user_id, u.organisation_id,
  (
    SELECT status_id
    FROM current_history_data_query
    WHERE user_id = u.user_id
    AND organisation_id = u.organisation_id
    AND date >= d.date
    ORDER BY date ASC
    LIMIT 1
  ) AS status_id,
  (
    SELECT stopped_reason_id
    FROM current_history_data_query
    WHERE user_id = u.user_id
    AND organisation_id = u.organisation_id
    AND date >= d.date
    ORDER BY date ASC
    LIMIT 1
  ) AS stopped_reason_id,
  (
    SELECT dossier_created
    FROM current_history_data_query
    WHERE user_id = u.user_id
    AND organisation_id = u.organisation_id
    AND date >= d.date
    ORDER BY date ASC
    LIMIT 1
  ) AS dossier_created
FROM dates d
JOIN (SELECT DISTINCT user_id, organisation_id FROM organisation_user_link) u
ORDER BY d.date DESC, u.user_id;

这是一张db<>fiddle美元.


可能有一个更好的解决方案,但实现期望输出的一种方法是使用连接到lateral derived table.修改后的最终查询为:

SELECT d.date, u.user_id, u.organisation_id, status_id, stopped_reason_id, DATE(dossier_created) AS dossier_created
FROM dates d
JOIN (SELECT DISTINCT user_id, organisation_id FROM organisation_user_link) u
JOIN LATERAL (
    SELECT *
    FROM current_history_data_query
    WHERE user_id = u.user_id
    AND organisation_id = u.organisation_id
    AND date >= d.date
    ORDER BY date ASC
    LIMIT 1
) h
ORDER BY d.date DESC, u.user_id;

给你一百块.

Sql相关问答推荐

获取每5分钟时间间隔的总和

在多个联合中使用相同的SELECT SQL查询

如何使用WSO2将空值传递给我的SQL Server存储过程?

Ffltter&;Dart SQL Lite包:是否可以在一个查询中执行多条更新语句(每次执行不同的WHERE参数)

SQL SELECT MOST NEST TIMESTAMP BEAT ORDER

如何查找所提供日期范围的所有季度开始日期和结束日期

如何计算一个用户S的日常连胜?

按日期时间(不包括秒)连接表

以一致的价值获得独特的价值

使用多个WITH子查询的替代方法

需要使用SQLite查询进行一些奇怪的时间转换

使用CTE在SNOWFLAKE中创建临时表

在 postgres 中插入或更新 jsonb 数组的对象

如何向 mariadb 添加外键?

在 R 值的 SQL 块中显示值

SQL 查找 varchar 类型列及其值中多次出现的子字符串

如何在 DAX 中通过 Window 函数应用订单

存储过程 - 动态 SQL 中不同列值的计数

SQL Server 分区和 Run Case 语句

sql count distinct by column 和 sum false 和 true