Postgresql 如何使用 WITH RECURSIVE 子句进行 Select

发布于09月06日

我在谷歌上搜索并通读了一些文章，比如

有人能给出非常清晰的解释来演示这种查询语义和执行过程吗，

要进行with recursive个好的查询，应该知道哪些基本指导原则和典型错误？

推荐答案

首先，让我们try 简化和澄清manual page上给出的算法描述.为了简化它，现在只考虑with recursive个子句中的union all个(以后union个):

WITH RECURSIVE pseudo-entity-name(column-names) AS (
    Initial-SELECT
UNION ALL
    Recursive-SELECT using pseudo-entity-name
)
Outer-SELECT using pseudo-entity-name

为了澄清这一点，让我们用伪代码描述查询执行过程:

working-recordset = result of Initial-SELECT

append working-recordset to empty outer-recordset

while( working-recordset is not empty ) begin

    new working-recordset = result of Recursive-SELECT 
        taking previous working-recordset as pseudo-entity-name

    append working-recordset to outer-recordset

end

overall-result = result of Outer-SELECT 
    taking outer-recordset as pseudo-entity-name

甚至更短——数据库引擎执行初始 Select ，将其结果行作为工作集.然后，它对工作集重复执行递归 Select ，每次都用获得的查询结果替换工作集的内容.当递归 Select 返回空集时，此过程结束.然后将初始 Select 和递归 Select 给出的所有结果行收集并反馈给外部 Select ，外部 Select 的结果成为整体查询结果.

此查询正在计算3项中的factorial项:

WITH RECURSIVE factorial(F,n) AS (
    SELECT 1 F, 3 n
UNION ALL
    SELECT F*n F, n-1 n from factorial where n>1
)
SELECT F from factorial where n=1

初始 Select SELECT 1 F, 3 n为我们提供初始值:3为参数，1为函数值.

首先，它执行initail select，它给出了工作记录集的初始状态:

F | n
--+--
1 | 3

然后使用递归查询转换工作记录集，并获得其第二个状态:

F | n
--+--
3 | 2

第三个州:

F | n
--+--
6 | 1

在第三种状态下，递归 Select 中没有符合n>1条件的行，因此工作集是循环出口.

外部记录集现在保存由初始和递归 Select 返回的所有行:

F | n
--+--
1 | 3
3 | 2
6 | 1

外部 Select 从外部记录集中筛选出所有中间结果，只显示最终的阶乘值，该阶乘值将成为整体查询结果:

F 
--
6

现在让我们考虑表forest(id,parent_id,name):

id | parent_id | name
---+-----------+-----------------
1  |           | item 1
2  | 1         | subitem 1.1
3  | 1         | subitem 1.2
4  | 1         | subitem 1.3
5  | 3         | subsubitem 1.2.1
6  |           | item 2
7  | 6         | subitem 2.1
8  |           | item 3

这里的"Expanding full tree"表示在计算树项目的级别和(可能)路径时，按人类可读的深度优先顺序对树项目进行排序.如果不使用WITH RECURSIVE子句(或Oracle CONNECT BY子句，PostgreSQL不支持)，这两个任务(正确的排序和计算级别或路径)都无法在一个(甚至任何常量)SELECT中解决.但是这个递归查询可以完成这项工作(好吧，几乎可以，请参见下面的注释):

WITH RECURSIVE fulltree(id,parent_id,level,name,path) AS (
    SELECT id, parent_id, 1 as level, name, name||'' as path from forest where parent_id is null
UNION ALL
    SELECT t.id, t.parent_id, ft.level+1 as level, t.name, ft.path||' / '||t.name as path
    from forest t, fulltree ft where t.parent_id = ft.id
)
SELECT * from fulltree order by path

数据库引擎的执行方式如下:

首先，它执行initail select，它给出forest个表中所有最高级别的项(根):

id | parent_id | level | name             | path
---+-----------+-------+------------------+----------------------------------------
1  |           | 1     | item 1           | item 1
8  |           | 1     | item 3           | item 3
6  |           | 1     | item 2           | item 2

然后，它执行递归 Select ，给出forest个表中的所有二级项:

id | parent_id | level | name             | path
---+-----------+-------+------------------+----------------------------------------
2  | 1         | 2     | subitem 1.1      | item 1 / subitem 1.1
3  | 1         | 2     | subitem 1.2      | item 1 / subitem 1.2
4  | 1         | 2     | subitem 1.3      | item 1 / subitem 1.3
7  | 6         | 2     | subitem 2.1      | item 2 / subitem 2.1

然后，它再次执行递归 Select ，检索3d级别的项目:

id | parent_id | level | name             | path
---+-----------+-------+------------------+----------------------------------------
5  | 3         | 3     | subsubitem 1.2.1 | item 1 / subitem 1.2 / subsubitem 1.2.1

现在，它再次执行递归 Select ，试图检索第四级项目，但它们都没有，所以循环退出.

外部 Select 设置正确的人类可读的行顺序，按路径列排序:

id | parent_id | level | name             | path
---+-----------+-------+------------------+----------------------------------------
1  |           | 1     | item 1           | item 1
2  | 1         | 2     | subitem 1.1      | item 1 / subitem 1.1
3  | 1         | 2     | subitem 1.2      | item 1 / subitem 1.2
5  | 3         | 3     | subsubitem 1.2.1 | item 1 / subitem 1.2 / subsubitem 1.2.1
4  | 1         | 2     | subitem 1.3      | item 1 / subitem 1.3
6  |           | 1     | item 2           | item 2
7  | 6         | 2     | subitem 2.1      | item 2 / subitem 2.1
8  |           | 1     | item 3           | item 3

NOTE:只有在项目名称中没有标点字符排序规则在/之前时，结果行顺序才会保持正确.如果我们在Item 1 *中重命名Item 2，它将打破行顺序，位于Item 1及其子代之间.

修改最后一个查询以扩展任意子树非常简单——只需将条件parent_id is null替换为perent_id=1(例如).请注意，此查询变量将返回所有级别和路径relative to Item 1.

现在大约有typical mistakes人.递归查询最显著的典型错误是在递归 Select 中定义病态停止条件，这会导致无限循环.

例如，如果我们在上面的阶乘样本中省略了where n>1个条件，递归 Select 的执行将永远不会给出一个空集(因为我们没有条件过滤掉单个行)，循环将无限继续.

这就是某些查询挂起的最可能原因(另一个非特定但仍然可能的原因是非常无效的select，它在有限但非常长的时间内执行).

据我所知，没有太多递归特定查询需要提及.但我想建议(相当明显)一步一步地递归查询构建过程.

分别构建和调试初始 Select .
用递归构造的脚手架将其包装起来

建议的脚手架 struct 如下所示:

WITH RECURSIVE rec( <Your column names> ) AS (
    <Your ready and working initial SELECT>
UNION ALL
    <Recursive SELECT that you are debugging now>
)
SELECT * from rec limit 1000

这个最简单的外部 Select 将输出整个外部记录集，正如我们所知，它包含初始 Select 的所有输出行，以及循环中按原始输出顺序执行RecurSrive select的每个输出行，就像上面的示例一样！limit 1000部件将防止挂起，替换为超大输出，您将能够看到错过的停止点.

调试初始和递归后， Select build并调试外部 Select .

最后要提的是，在with recursive条款中使用100而不是union all的区别.它引入了行唯一性约束，这会在我们的执行伪代码中增加两行:

working-recordset = result of Initial-SELECT

discard duplicate rows from working-recordset /*union-specific*/

append working-recordset to empty outer-recordset

while( working-recordset is not empty ) begin

    new working-recordset = result of Recursive-SELECT 
        taking previous working-recordset as pseudo-entity-name

    discard duplicate rows and rows that have duplicates in outer-recordset 
        from working-recordset /*union-specific*/

    append working-recordset to outer-recordset

end

overall-result = result of Outer-SELECT 
    taking outer-recordset as pseudo-entity-name

Postgresql 如何使用 WITH RECURSIVE 子句进行 Select

推荐答案

Postgresql相关问答推荐

Gorm 中的更新将created_at、updated_at 作为默认时间

使用来自 docker 图像的 postgres 设置 Keycloak 21

如何重新格式化已获取的 psql 查询输出？

如何在postgresql中按时间查询

Postgres 查询计划器不会使用更便宜的复合索引

我应该 Select 哪种数据类型？

Postgres 低估了导致错误查询计划的行数

是否可以使用 pgAdmin4 自动格式化/美化 SQL 查询？

在 PostgreSQL 中返回插入的行

如何将 DELETE 的返回值插入到 postgresql 中的 INSERT 中？

在PostgreSQL中 Select 数组列的总和

在 Windows 10 中执行时，Docker 容器关闭并给出data directory has wrong ownership错误

SQL - 提示引用列

null 计算结果为 false 的情况

在 postgresql 中获取上个月的数据

PostgreSQL：从转储中恢复数据库 - 语法错误

SQL：子查询的列太多

PL/pgSQL 执行与执行

错误：permission denied for language c

PostgreSQL - 如何将数字字段中的秒数转换为 HH：MM：SS