Sql 为什么在postgres中，横向连接比相关子查询快

发布于02月19日

我已经将以下查询从使用correlated subquery重写为使用LATERAL JOIN.现在更快了，我想知道为什么？

这SO answer人说，没有金科玉律，它真的取决于情况.然而，我想知道这是不是有某种直觉？

如果是相关的，则此查询用作VIEW.

旧的查询带有correlated subquery(严格地说是2):

SELECT
  ts.id,
  ts.val1,
  tv.val2
FROM table_static AS ts
JOIN table_version AS tv ON ts.id = tv.id
  AND tv.effective_time = (
    SELECT MAX(tv1.effective_time)
    FROM table_version AS tv1
    WHERE
      tv1.id = ts.id
      AND
      tv1.effective_time <= CLOCK_TIMESTAMP()
  )
  AND tv.create_time = (
    SELECT MAX(tv2.create_time)
    FROM table_version AS tv2
    WHERE
      tv2.id = tv.id
      AND
      tv2.effective_time = tv.effective_time
      AND
      tv2.create_time <= CLOCK_TIMESTAMP()
  )
JOIN table_status AS t_status ON tv.status_id = t_status.id
WHERE t_status.status != 'deleted'
LIMIT 1000;

查询计划(相关子查询):

Limit  (cost=4.96..13876200.85 rows=141 width=64) (actual time=0.078..10.788 rows=1000 loops=1)
  ->  Nested Loop  (cost=4.96..13876200.85 rows=141 width=64) (actual time=0.077..10.641 rows=1000 loops=1)
        Join Filter: (tv.status_id = t_status.id)
        ->  Nested Loop  (cost=4.96..13876190.47 rows=176 width=64) (actual time=0.065..10.169 rows=1000 loops=1)
              ->  Seq Scan on table_static ts  (cost=0.00..17353.01 rows=1000001 width=32) (actual time=0.010..0.176 rows=1000 loops=1)
              ->  Index Scan using table_version_pkey on table_version tv  (cost=4.96..13.85 rows=1 width=40) (actual time=0.005..0.006 rows=1 loops=1000)
                    Index Cond: ((id = ts.id) AND (effective_time = (SubPlan 2)))
                    Filter: (create_time = (SubPlan 4))
                    SubPlan 4
                      ->  Result  (cost=8.46..8.47 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1000)
                            InitPlan 3 (returns $4)
                              ->  Limit  (cost=0.43..8.46 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1000)
                                    ->  Index Only Scan Backward using table_version_pkey on table_version tv2  (cost=0.43..8.46 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1000)
                                          Index Cond: ((id = tv.id) AND (effective_time = tv.effective_time) AND (create_time IS NOT NULL))
                                          Filter: (create_time <= clock_timestamp())
                                          Heap Fetches: 0
                    SubPlan 2
                      ->  Result  (cost=4.52..4.53 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1000)
                            InitPlan 1 (returns $1)
                              ->  Limit  (cost=0.43..4.52 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1000)
                                    ->  Index Only Scan Backward using table_version_pkey on table_version tv1  (cost=0.43..8.61 rows=2 width=8) (actual time=0.002..0.002 rows=1 loops=1000)
                                          Index Cond: ((id = ts.id) AND (effective_time IS NOT NULL))
                                          Filter: (effective_time <= clock_timestamp())
                                          Heap Fetches: 0
        ->  Materialize  (cost=0.00..1.08 rows=4 width=8) (actual time=0.000..0.000 rows=1 loops=1000)
              ->  Seq Scan on table_status t_status  (cost=0.00..1.06 rows=4 width=8) (actual time=0.006..0.006 rows=1 loops=1)
                    Filter: (status <> 'deleted'::text)
Planning Time: 0.827 ms
Execution Time: 10.936 ms

新的查询LATERAL JOIN:

SELECT
  ts.id,
  ts.val1,
  tv.val2
FROM table_static AS ts
JOIN LATERAL (
  SELECT *
  FROM table_version AS tv
  WHERE
    ts.id = tv.id
    AND
    tv.effective_time <= CLOCK_TIMESTAMP()
    AND
    tv.create_time <= CLOCK_TIMESTAMP()
  ORDER BY
    tv.effective_time DESC,
    tv.create_time DESC
  LIMIT 1
) AS tv ON TRUE
JOIN table_status AS t_status ON tv.status_id = t_status.id
WHERE t_status.status != 'deleted'
LIMIT 1000;

查询计划(LATERAL JOIN):

Limit  (cost=0.43..40694.36 rows=1000 width=64) (actual time=0.218..4.431 rows=1000 loops=1)
  ->  Nested Loop  (cost=0.43..32555183.83 rows=800001 width=64) (actual time=0.217..4.280 rows=1000 loops=1)
        Join Filter: (tv.status_id = t_status.id)
        ->  Nested Loop  (cost=0.43..32502382.70 rows=1000001 width=64) (actual time=0.189..3.815 rows=1000 loops=1)
              ->  Seq Scan on table_static ts  (cost=0.00..17353.01 rows=1000001 width=32) (actual time=0.059..0.297 rows=1000 loops=1)
              ->  Limit  (cost=0.43..32.46 rows=1 width=48) (actual time=0.003..0.003 rows=1 loops=1000)
                    ->  Index Scan Backward using table_version_pkey on table_version tv  (cost=0.43..32.46 rows=1 width=48) (actual time=0.003..0.003 rows=1 loops=1000)
                          Index Cond: (id = ts.id)
                          Filter: ((effective_time <= clock_timestamp()) AND (create_time <= clock_timestamp()))
        ->  Materialize  (cost=0.00..1.08 rows=4 width=8) (actual time=0.000..0.000 rows=1 loops=1000)
              ->  Seq Scan on table_status t_status  (cost=0.00..1.06 rows=4 width=8) (actual time=0.021..0.021 rows=1 loops=1)
                    Filter: (status <> 'deleted'::text)
Planning Time: 1.315 ms
Execution Time: 4.746 ms

对于这两种情况，都存在以下索引(主键):

ALTER TABLE ONLY table_static
    ADD CONSTRAINT table_static_pkey PRIMARY KEY (id);
ALTER TABLE ONLY table_version
    ADD CONSTRAINT table_version_pkey PRIMARY KEY (id, effective_time, create_time);
ALTER TABLE ONLY table_status
    ADD CONSTRAINT table_status_pkey PRIMARY KEY (id);

也许答案仅仅是因为少了一个"子查询"？据我所知，这两个查询都可以使用索引.

如果有任何其他方法可以优化这个查询，我很乐意听到它们.

-- correlated SELECT ts.id, ts.val1 , (SELECT tv.val2 FROM table_version tv WHERE ts.id = tv.id AND tv.effective_time <= now() AND tv.create_time <= now() ORDER BY tv.effective_time DESC, tv.create_time DESC LIMIT 1 ) AS val2 FROM table_static ts JOIN table_status t_status ON tv.status_id = t_status.id WHERE t_status.status <> 'deleted' LIMIT 1000;

-- lateral SELECT ts.id, ts.val1, tv.val2 FROM table_static ts JOIN table_status t_status ON tv.status_id = t_status.id LEFT JOIN LATERAL ( SELECT tv.val2 FROM table_version tv WHERE ts.id = tv.id AND tv.effective_time <= now() AND tv.create_time <= now() ORDER BY tv.effective_time DESC, tv.create_time DESC LIMIT 1 ) tv ON true WHERE t_status.status <> 'deleted' LIMIT 1000;

Sql 为什么在postgres中，横向连接比相关子查询快

推荐答案

Sql相关问答推荐

用相同值更新行

解析键-值对，根据值 Select ，并使用SQL创建新列

删除MariaDB数据库中的JSON数据

按分隔符和总和分析字符串

缺少日期标识

属于(日期)范围类型及其交集的总权重

如何在多列上编写具有不同条件的查询？

Postgres，使用 select 插入多个值

SQL Select 字母范围没有给我任何东西

使用长 IN 子句的 SQL 优化

Postgres存在限制问题「小值」

一次 Select 语句中按组累计的SQL累计数

查询中获取审批者不起作用

如何将输出转换为二维格式？

基于字符串的SQL查询

批量更改WooCommerce中所有产品的税收状态

将表格和字符串连接以 for each 记录生成订单项目

条件意外地显着降低性能的地方

Oracle SQL 查询自行运行，但在包装到select count(*) from ()时失败

SQL：有没有办法根据另一列的数据细节过滤和形成另一列？