我已经将以下查询从使用correlated subquery
重写为使用LATERAL JOIN
.现在更快了,我想知道为什么?
这SO answer人说,没有金科玉律,它真的取决于情况.然而,我想知道这是不是有某种直觉?
如果是相关的,则此查询用作VIEW
.
旧的查询带有correlated subquery
(严格地说是2):
SELECT
ts.id,
ts.val1,
tv.val2
FROM table_static AS ts
JOIN table_version AS tv ON ts.id = tv.id
AND tv.effective_time = (
SELECT MAX(tv1.effective_time)
FROM table_version AS tv1
WHERE
tv1.id = ts.id
AND
tv1.effective_time <= CLOCK_TIMESTAMP()
)
AND tv.create_time = (
SELECT MAX(tv2.create_time)
FROM table_version AS tv2
WHERE
tv2.id = tv.id
AND
tv2.effective_time = tv.effective_time
AND
tv2.create_time <= CLOCK_TIMESTAMP()
)
JOIN table_status AS t_status ON tv.status_id = t_status.id
WHERE t_status.status != 'deleted'
LIMIT 1000;
查询计划(相关子查询):
Limit (cost=4.96..13876200.85 rows=141 width=64) (actual time=0.078..10.788 rows=1000 loops=1)
-> Nested Loop (cost=4.96..13876200.85 rows=141 width=64) (actual time=0.077..10.641 rows=1000 loops=1)
Join Filter: (tv.status_id = t_status.id)
-> Nested Loop (cost=4.96..13876190.47 rows=176 width=64) (actual time=0.065..10.169 rows=1000 loops=1)
-> Seq Scan on table_static ts (cost=0.00..17353.01 rows=1000001 width=32) (actual time=0.010..0.176 rows=1000 loops=1)
-> Index Scan using table_version_pkey on table_version tv (cost=4.96..13.85 rows=1 width=40) (actual time=0.005..0.006 rows=1 loops=1000)
Index Cond: ((id = ts.id) AND (effective_time = (SubPlan 2)))
Filter: (create_time = (SubPlan 4))
SubPlan 4
-> Result (cost=8.46..8.47 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1000)
InitPlan 3 (returns $4)
-> Limit (cost=0.43..8.46 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1000)
-> Index Only Scan Backward using table_version_pkey on table_version tv2 (cost=0.43..8.46 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=1000)
Index Cond: ((id = tv.id) AND (effective_time = tv.effective_time) AND (create_time IS NOT NULL))
Filter: (create_time <= clock_timestamp())
Heap Fetches: 0
SubPlan 2
-> Result (cost=4.52..4.53 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1000)
InitPlan 1 (returns $1)
-> Limit (cost=0.43..4.52 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=1000)
-> Index Only Scan Backward using table_version_pkey on table_version tv1 (cost=0.43..8.61 rows=2 width=8) (actual time=0.002..0.002 rows=1 loops=1000)
Index Cond: ((id = ts.id) AND (effective_time IS NOT NULL))
Filter: (effective_time <= clock_timestamp())
Heap Fetches: 0
-> Materialize (cost=0.00..1.08 rows=4 width=8) (actual time=0.000..0.000 rows=1 loops=1000)
-> Seq Scan on table_status t_status (cost=0.00..1.06 rows=4 width=8) (actual time=0.006..0.006 rows=1 loops=1)
Filter: (status <> 'deleted'::text)
Planning Time: 0.827 ms
Execution Time: 10.936 ms
新的查询LATERAL JOIN
:
SELECT
ts.id,
ts.val1,
tv.val2
FROM table_static AS ts
JOIN LATERAL (
SELECT *
FROM table_version AS tv
WHERE
ts.id = tv.id
AND
tv.effective_time <= CLOCK_TIMESTAMP()
AND
tv.create_time <= CLOCK_TIMESTAMP()
ORDER BY
tv.effective_time DESC,
tv.create_time DESC
LIMIT 1
) AS tv ON TRUE
JOIN table_status AS t_status ON tv.status_id = t_status.id
WHERE t_status.status != 'deleted'
LIMIT 1000;
查询计划(LATERAL JOIN
):
Limit (cost=0.43..40694.36 rows=1000 width=64) (actual time=0.218..4.431 rows=1000 loops=1)
-> Nested Loop (cost=0.43..32555183.83 rows=800001 width=64) (actual time=0.217..4.280 rows=1000 loops=1)
Join Filter: (tv.status_id = t_status.id)
-> Nested Loop (cost=0.43..32502382.70 rows=1000001 width=64) (actual time=0.189..3.815 rows=1000 loops=1)
-> Seq Scan on table_static ts (cost=0.00..17353.01 rows=1000001 width=32) (actual time=0.059..0.297 rows=1000 loops=1)
-> Limit (cost=0.43..32.46 rows=1 width=48) (actual time=0.003..0.003 rows=1 loops=1000)
-> Index Scan Backward using table_version_pkey on table_version tv (cost=0.43..32.46 rows=1 width=48) (actual time=0.003..0.003 rows=1 loops=1000)
Index Cond: (id = ts.id)
Filter: ((effective_time <= clock_timestamp()) AND (create_time <= clock_timestamp()))
-> Materialize (cost=0.00..1.08 rows=4 width=8) (actual time=0.000..0.000 rows=1 loops=1000)
-> Seq Scan on table_status t_status (cost=0.00..1.06 rows=4 width=8) (actual time=0.021..0.021 rows=1 loops=1)
Filter: (status <> 'deleted'::text)
Planning Time: 1.315 ms
Execution Time: 4.746 ms
对于这两种情况,都存在以下索引(主键):
ALTER TABLE ONLY table_static
ADD CONSTRAINT table_static_pkey PRIMARY KEY (id);
ALTER TABLE ONLY table_version
ADD CONSTRAINT table_version_pkey PRIMARY KEY (id, effective_time, create_time);
ALTER TABLE ONLY table_status
ADD CONSTRAINT table_status_pkey PRIMARY KEY (id);
也许答案仅仅是因为少了一个"子查询"?据我所知,这两个查询都可以使用索引.
如果有任何其他方法可以优化这个查询,我很乐意听到它们.