我使用的是MariaDB SQL.我在我的网站上创建一个统计功能,需要获取用户的历史数据汇总在每天的基础上. 为此,我创建了一个历史表,其中只包含发生更改时的行.我还有我的"主"表,其中包含今天的当前数据.
因此,如果这个值不为空,我需要查询从最后一行检索每个user_id和organization_id组合的状态id.
为了举例说明这个数据,这被认为是"当前"数据 (表名=ORGIZATION_USER_LINK):
id | user_id | organisation_id | status_id | stopped_reason_id | dossier_created |
---|---|---|---|---|---|
1 | 3 | 73 | 2 | NULL | 2021-10-29 07:50:21 |
2 | 9 | 1199 | 4 | 5 | 2021-05-19 17:44:07 |
接下来是我的历史数据,看起来非常相似 (表名= organization_user_link_status_history):
timestamp | user_id | organisation_id | status_id | stopped_reason_id |
---|---|---|---|---|
2024-03-11 12:05:30 | 3 | 73 | 1 | NULL |
2024-03-08 11:15:35 | 3 | 73 | 3 | NULL |
2024-03-05 13:25:40 | 3 | 73 | 4 | 3 |
2024-03-13 02:07:10 | 9 | 1199 | 1 | NULL |
2024-03-11 02:07:10 | 9 | 1199 | 2 | NULL |
我希望我的结果包括从今天开始到特定日期的每一天.其中,每一天都具有前一行的值,以防该天没有值.这些值是按DESC排序的,因此"当前"数据始终排在第一位,因为这是今天的数据.
这就是我想要成为的结果:
date | user_id | organisation_id | status_id | stopped_reason_id | dossier_created |
---|---|---|---|---|---|
2024-03-14 | 3 | 73 | 2 | NULL | 2021-10-29 |
2024-03-14 | 9 | 1199 | 4 | 5 | 2021-05-19 |
2024-03-13 | 3 | 73 | 2 | NULL | 2021-10-29 |
2024-03-13 | 9 | 1199 | 1 | NULL | 2021-05-19 |
2024-03-12 | 3 | 73 | 2 | NULL | 2021-10-29 |
2024-03-12 | 9 | 1199 | 1 | NULL | 2021-05-19 |
2024-03-11 | 3 | 73 | 1 | NULL | 2021-10-29 |
2024-03-11 | 9 | 1199 | 2 | NULL | 2021-05-19 |
2024-03-10 | 3 | 73 | 1 | NULL | 2021-10-29 |
2024-03-10 | 9 | 1199 | 2 | NULL | 2021-05-19 |
2024-03-09 | 3 | 73 | 1 | NULL | 2021-10-29 |
2024-03-09 | 9 | 1199 | 2 | NULL | 2021-05-19 |
2024-03-08 | 3 | 73 | 3 | NULL | 2021-10-29 |
2024-03-08 | 9 | 1199 | 2 | NULL | 2021-05-19 |
2024-03-07 | 3 | 73 | 3 | NULL | 2021-10-29 |
2024-03-07 | 9 | 1199 | 2 | NULL | 2021-05-19 |
2024-03-06 | 3 | 73 | 3 | NULL | 2021-10-29 |
2024-03-06 | 9 | 1199 | 2 | NULL | 2021-05-19 |
2024-03-05 | 3 | 73 | 4 | 3 | 2021-10-29 |
2024-03-05 | 9 | 1199 | 2 | NULL | 2021-05-19 |
2024-03-04 | 3 | 73 | 4 | 3 | 2021-10-29 |
2024-03-04 | 9 | 1199 | 2 | NULL | 2021-05-19 |
2024-03-03 | 3 | 73 | 4 | 3 | 2021-10-29 |
2024-03-03 | 9 | 1199 | 2 | NULL | 2021-05-19 |
2024-03-02 | 3 | 73 | 4 | 3 | 2021-10-29 |
2024-03-02 | 9 | 1199 | 2 | NULL | 2021-05-19 |
2024-03-01 | 3 | 73 | 4 | 3 | 2021-10-29 |
2024-03-01 | 9 | 1199 | 2 | NULL | 2021-05-19 |
这就是我现在的疑问:
WITH RECURSIVE dates (
DATE
) AS (
-- SELECT MIN(DATE(created))
-- FROM organisation
SELECT DATE('2024-03-01')
UNION ALL
SELECT DATE(date) + INTERVAL 1 DAY
FROM dates
WHERE DATE(DATE) < (NOW() - INTERVAL 1 DAY)
),
current_history_data_query AS (
SELECT
current_history_data.*
FROM (
SELECT
DATE(timestamp) AS date,
user_id,
organisation_id,
status_id,
stopped_reason_id,
dossier_created,
'history-data' AS src
FROM (
SELECT
oulsh.user_id,
oulsh.organisation_id,
oulsh.timestamp,
oulsh.status_id,
oulsh.stopped_reason_id,
oul.dossier_created,
ROW_NUMBER() OVER (PARTITION BY oulsh.user_id, oulsh.organisation_id, DATE(oulsh.timestamp) ORDER BY oulsh.timestamp DESC) AS row_num
FROM organisation_user_link_status_history AS oulsh
INNER JOIN organisation_user_link AS oul ON oulsh.user_id = oul.user_id AND oulsh.organisation_id = oul.organisation_id
) AS numbered_rows
WHERE row_num = 1 AND DATE(timestamp) != DATE(NOW())
UNION ALL
SELECT DATE(NOW()) AS date, oul.user_id, oul.organisation_id, oul.status_id, oul.stopped_reason_id, oul.dossier_created, 'current-data' AS src
FROM organisation_user_link AS oul
) AS current_history_data
ORDER BY DATE DESC
)
SELECT
dates.date AS dates_date,
COALESCE(user_id, LAG(user_id) OVER (ORDER BY dates_date DESC)) AS user_id,
COALESCE(organisation_id, LAG(organisation_id) OVER (ORDER BY dates_date DESC)) AS organisation_id,
COALESCE(status_id, LAG(status_id) OVER (ORDER BY dates_date DESC)) AS status_id,
COALESCE(stopped_reason_id, LAG(stopped_reason_id) OVER (ORDER BY dates_date DESC)) AS stopped_reason_id,
COALESCE(dossier_created, LAG(dossier_created) OVER (ORDER BY dates_date DESC)) AS dossier_created
FROM dates
LEFT JOIN current_history_data_query AS chdq ON dates.date = chdq.date
GROUP BY DATE(dates.date)
ORDER BY dates.date DESC;
使用此查询,会出现多个问题:
- 对于原始非空值行之后的第一行:数据正在正确复制.但是,之后,例如第三行仍然是NULL.即使第二行刚刚被GAP()窗口函数填充.
- 上面提供的代码没有考虑到我还需要按user_id和organization_id分区.如果我在LAG()函数中添加:PARTION BY user_id,organization_id.然后整个LAG功能不再工作,甚至我的第二行都没有得到填充的数据.
我在这里遗漏了什么,我如何解决这个问题?