我对SQL的学习相对较早.我制作了一些数据,其中有日期缺失的记录.我希望捕获显示缺失日期和受影响的相关部门名称的记录作为输出.

预计丢失日期的输出如下

Date        DepartmentID
2023-11-03  3001
2023-11-03  4001
2023-11-06  1001
2023-11-06  2001
2023-11-07  1001
2023-11-07  2001
2023-11-07  4001
2023-11-09  4001

我的餐桌设置

Create table Departments (date date,DepartmentID int, Name text);

insert into Departments values
(to_date('1.11.23','DD.MM,YY'),1001,'SRO'),
(to_date('1.11.23','DD.MM,YY'),2001,'Drs'),
(to_date('1.11.23','DD.MM,YY'),3001,'Accounting'),
(to_date('1.11.23','DD.MM,YY'),4001,'ds'),
(to_date('2.11.23','DD.MM,YY'),1001,'SRO'),
(to_date('2.11.23','DD.MM,YY'),2001,'Drs'),
(to_date('2.11.23','DD.MM,YY'),3001,'Accounting'),
(to_date('2.11.23','DD.MM,YY'),4001,'ds'),
(to_date('3.11.23','DD.MM,YY'),1001,'SRO'),
(to_date('3.11.23','DD.MM,YY'),2001,'Drs'),
(to_date('4.11.23','DD.MM,YY'),1001,'SRO'),
(to_date('4.11.23','DD.MM,YY'),2001,'Drs'),
(to_date('4.11.23','DD.MM,YY'),3001,'Accounting'),
(to_date('4.11.23','DD.MM,YY'),4001,'ds'),
(to_date('5.11.23','DD.MM,YY'),1001,'SRO'),
(to_date('5.11.23','DD.MM,YY'),2001,'Drs'),
(to_date('5.11.23','DD.MM,YY'),3001,'Accounting'),
(to_date('5.11.23','DD.MM,YY'),4001,'ds'),
(to_date('6.11.23','DD.MM,YY'),3001,'Accounting'),
(to_date('6.11.23','DD.MM,YY'),4001,'ds'),
(to_date('7.11.23','DD.MM,YY'),3001,'Accounting'),
(to_date('8.11.23','DD.MM,YY'),1001,'SRO'),
(to_date('8.11.23','DD.MM,YY'),2001,'Drs'),
(to_date('8.11.23','DD.MM,YY'),3001,'Accounting'),
(to_date('8.11.23','DD.MM,YY'),4001,'ds'),
(to_date('9.11.23','DD.MM,YY'),1001,'SRO'),
(to_date('9.11.23','DD.MM,YY'),2001,'Drs'),
(to_date('9.11.23','DD.MM,YY'),3001,'Accounting');

到目前为止,我的逻辑是下面的SQL,同时遵循来自here的说明

with lastdate as
(select max(date) as Maxdate from Departments)

  select date, 
  --lead(date),
  select maxdate from lastdate OVER(partition by date ORDER BY date) as Next_Date 
  from Departments

但这会产生错误,如图所示

错误:在"SELECT"第6行或其附近出现语法错误:SELECT MAXDATE FROM 最后日期已过(按日期顺序分区... ^

这是在波斯格雷斯.让我感到困惑的事情

  1. 如何运行CTE并针对CTE同时运行OVER语句
  2. 解决这个问题的最佳方法.

推荐答案

在PostgreSQL中,anti-joinsexcept:demo at db<>fiddle

with required_dept_date as (
  select distinct DepartmentID, 
         generate_series('2023.11.01'::date,'2023.11.10','1 day')::date AS date
  from Departments)
select required_dept_date.*
from required_dept_date natural left join Departments
where Departments.DepartmentID is null order by 1,2;

它需要0.14秒.您可以通过将startstop日期设置为generate_series()来 Select 要放大的时段.当在 Select 列表中使用时,它将为所有可能的部门生成所有必需的日期,您可以将这些日期与您的表连接起来,并挑选出您的表不匹配的日期和部门(Departments is null).

同样,使用except将是few times slower:

select distinct DepartmentID, 
       generate_series('2023.11.01'::date,'2023.11.10','1 day')::date AS date
from Departments
except
select DepartmentID, date
from Departments;

这是一个实现细节,因此它可以在future 更改--毕竟,这两种形式表达了相同的概念.反联接的替代形式包括not exists, <>all(), not in,而PostgreSQL planner从at least version 8.4开始就有这个"preference"(在当前的16.1中仍然有).其他RDBMS不一定会有这种偏见.

要获取所有日期而不是特定期间,可以分别基于min()max()生成它们,或者直接在需要这些值的地方插入不相关的scalar subqueries:

with required_dept_date as (
  select distinct DepartmentID, 
         generate_series((select min(date) from Departments),
                         (select max(date) from Departments),
                         '1 day')::date AS date
  from Departments)
select required_dept_date.*
from required_dept_date left join Departments using (DepartmentID,date)
where Departments.DepartmentID is null;

这些值非常灵活和直观,但最好确保不会查询太多、太多次,因此try 在一次扫描中获得这些值可能会更好.

如果您想忽略无人报告的日期(在您的小提琴中为11月3日和7日):不是生成完整的日历,而是提取您拥有的所有不同的日期,并将它们与所有不同的部门合并:

with required_dept as (
  select distinct DepartmentID from Departments)
,required_dept_date as (
  select distinct date,rd.DepartmentID from Departments, required_dept AS rd)
select required_dept_date.*
from Departments natural right join required_dept_date
where Departments is null;

注意a natural join ba join b using (c1,c2)--它们是a join b on a.c1=b.c1 and a.c2=b.c2的替代形式.Natural自动检测所有匹配的列名并使用它们连接,而using匹配列表中圆括号中的列.务必确保为该目的(CTE、子查询、别名、模式设置)准备和命名列-否则,您可能会意外地联接到您不打算使用的列,但恰好有匹配的名称.

在任何情况下,您都可能希望 for each 日期获取所有缺失部门中的array个:

with required_dept_date as (
  select distinct DepartmentID, 
         generate_series((select min(date) from Departments),
                         (select max(date) from Departments),
                         '1 day')::date AS date
  from Departments)
select required_dept_date.date,
       array_agg(required_dept_date.departmentid 
                 order by required_dept_date.departmentid) AS missing
from required_dept_date left join Departments using (DepartmentID,date)
where Departments.DepartmentID is null
group by 1 order by 1;
date missing
2023-11-01 {700,2001,3001,4001,5001,6001}
2023-11-02 {700,1001,3001,4001,5001,6001}
2023-11-03 {700,1001,2001,3001,4001,5001,6001}
2023-11-04 {700,1001,2001,4001,5001,6001}
2023-11-05 {700,1001,2001,3001,5001,6001}
2023-11-06 {700,1001,2001,3001,4001,6001}
2023-11-07 {700,1001,2001,3001,4001,5001,6001}
2023-11-08 {700,1001,2001,3001,4001,5001}
2023-11-09 {1001,2001,3001,4001,5001,6001}

Sql相关问答推荐

在postgresql中使用来自另一个字段的日期名称作为JSONB查询中的关键字

具有2个共享列的两个表的Amazon RSQL合并

如何将资源密集型自连接转换为更快的查询?

有没有办法在每次计算每一行的数据时更新2个值?

连接特定行号

没有循环的SQL更新多个XML node 值

基于多列比较连接两个表

如何在PostgreSQL中的一列中添加两个文本?

在SQL查询中使用COALESS

根据开始日期和结束日期的差异计算每天的计费

如何找到一个组合的两个列,这是不是在其他表在ORACLE SQL?

此过程如何在不引用传递的参数值的情况下执行工作?

如何将insert语句重复n次使一个值递增?

如何在MS Access中基于另外两个表自动填充一个表中的字段?

使用 XML 作为 SQL 表

JSON_VALUE 不适用于提取的 json 中的嵌套路径

错误:postgresql 中缺少表评级的 FROM 子句条目

删除重复记录但保留最新的SQL查询

正则表达式:停在第一个匹配的其中一个字符位置上

PostgreSQL如何将Unix纪元时间戳转换为日期时间并进行拼接