Postgresql 如何使 Postgres 优化引擎具有确定性

发布于03月30日

在不改变结果的参数上，postgres的行为完全不同.

-- This query is fast (~ 40 ms)
WITH filtered_scans AS (
  SELECT id FROM scans
  WHERE scans."createdAt" BETWEEN NOW() - INTERVAL '26 day' AND NOW() AND
    scans."applicationId" = '2ce67bbf-d740-4f4e-aaf8-33552d54e482'
)
SELECT "scanId", percentile_disc(0.5) WITHIN GROUP (ORDER BY duration) AS p50
FROM scan_exchanges
WHERE "scanId" IN (SELECT id FROM filtered_scans)
GROUP BY "scanId";

-- This one is slow (~ 2 s)
EXPLAIN ANALYZE WITH filtered_scans AS (
  SELECT id FROM scans
  WHERE scans."createdAt" BETWEEN NOW() - INTERVAL '27 day' AND NOW() AND
                                                 -- ⬆️ the only difference is here 
    scans."applicationId" = '2ce67bbf-d740-4f4e-aaf8-33552d54e482'
)
SELECT "scanId", percentile_disc(0.5) WITHIN GROUP (ORDER BY duration) AS p50
FROM scan_exchanges
WHERE "scanId" IN (SELECT id FROM filtered_scans)
GROUP BY "scanId";

在这两种情况下，filtered_scans表是相同的:

id
2a823963-d019-4066-aab5-fed94b5f05fd
4684e3c7-bbba-4fb4-909c-fc6d512f5555
abf56697-2400-452c-a68a-e243d53bafb0

生成的表格也是相同的:

scanId	p50
2a823963-d019-4066-aab5-fed94b5f05fd	0,09799999999813735
4684e3c7-bbba-4fb4-909c-fc6d512f5555	0,07500000001164153
abf56697-2400-452c-a68a-e243d53bafb0	0,1159999999217689

表scan_exchanges在scanId上有一个名为scan_exchanges_scanId_idx的索引.表列:

CREATE TABLE "scans" (
    "id" UUID NOT NULL,
    "createdAt" TIMESTAMP NOT NULL DEFAULT 'CURRENT_TIMESTAMP',
    "applicationId" UUID NOT NULL,
    PRIMARY KEY ("id"),
    INDEX "scans_applicationId_idx" ("applicationId")
)

CREATE TABLE "scan_exchanges" (
    "id" UUID NOT NULL,
    "duration" DOUBLE PRECISION NOT NULL,
    "scanId" UUID NOT NULL,
    PRIMARY KEY ("id"),
    INDEX "scan_exchanges_scanId_idx" ("scanId"),
    CONSTRAINT "scan_exchanges_scanId_fkey" FOREIGN KEY ("scanId") REFERENCES "scans" ("id") ON UPDATE CASCADE ON DELETE CASCADE
)
-- I removed some fields irrelevant to the question, tell me if I removed too much

我try 在查询上运行ANALYZE EXPLAIN，Postgres 13.9.0返回了截然不同的结果:

发帖主题:Re:Kolibrios

QUERY PLAN
GroupAggregate  (cost=26.03..308595.21 rows=818 width=24) (actual time=24.336..26.808 rows=3 loops=1)
  Group Key: scan_exchanges."scanId"
  ->  Nested Loop  (cost=26.03..308330.41 rows=50915 width=24) (actual time=23.299..26.374 rows=1414 loops=1)
        ->  Index Scan using scans_pkey on scans  (cost=0.28..1488.47 rows=71 width=16) (actual time=23.211..24.283 rows=3 loops=1)
              Filter: (("applicationId" = '2ce67bbf-d740-4f4e-aaf8-33552d54e482'::uuid) AND ("createdAt" <= now()) AND ("createdAt" >= (now() - '26 days'::interval)))
              Rows Removed by Filter: 1482
        ->  Bitmap Heap Scan on scan_exchanges  (cost=25.75..4308.74 rows=1298 width=24) (actual time=0.054..0.623 rows=471 loops=3)
              Recheck Cond: ("scanId" = scans.id)
              Heap Blocks: exact=492
              ->  Bitmap Index Scan on "scan_exchanges_scanId_idx"  (cost=0.00..25.43 rows=1298 width=0) (actual time=0.034..0.034 rows=509 loops=3)
                    Index Cond: ("scanId" = scans.id)
Planning Time: 0.393 ms
JIT:
  Functions: 14
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 2.845 ms, Inlining 0.000 ms, Optimization 1.138 ms, Emission 21.245 ms, Total 25.228 ms
Execution Time: 29.853 ms

查询速度慢:

QUERY PLAN
GroupAggregate  (cost=306783.39..307261.53 rows=818 width=24) (actual time=2028.021..2028.437 rows=3 loops=1)
  Group Key: scan_exchanges."scanId"
  ->  Sort  (cost=306783.39..306939.36 rows=62389 width=24) (actual time=2027.787..2027.933 rows=1414 loops=1)
        Sort Key: scan_exchanges."scanId"
        Sort Method: quicksort  Memory: 159kB
        ->  Hash Join  (cost=370.11..301814.42 rows=62389 width=24) (actual time=179.534..2027.085 rows=1414 loops=1)
              Hash Cond: (scan_exchanges."scanId" = scans.id)
              ->  Seq Scan on scan_exchanges  (cost=0.00..298648.46 rows=1062046 width=24) (actual time=25.341..1843.927 rows=1067166 loops=1)
              ->  Hash  (cost=369.02..369.02 rows=87 width=16) (actual time=0.538..0.540 rows=3 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 9kB
                    ->  Bitmap Heap Scan on scans  (cost=10.71..369.02 rows=87 width=16) (actual time=0.100..0.534 rows=3 loops=1)
                          Recheck Cond: ("applicationId" = '2ce67bbf-d740-4f4e-aaf8-33552d54e482'::uuid)
                          Filter: (("createdAt" <= now()) AND ("createdAt" >= (now() - '27 days'::interval)))
                          Rows Removed by Filter: 319
                          Heap Blocks: exact=156
                          ->  Bitmap Index Scan on "scans_applicationId_idx"  (cost=0.00..10.69 rows=322 width=0) (actual time=0.042..0.042 rows=322 loops=1)
                                Index Cond: ("applicationId" = '2ce67bbf-d740-4f4e-aaf8-33552d54e482'::uuid)
Planning Time: 0.689 ms
JIT:
  Functions: 19
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 6.831 ms, Inlining 0.000 ms, Optimization 1.793 ms, Emission 23.113 ms, Total 31.737 ms
Execution Time: 2035.574 ms

我还没有进行足够的实验，无法理解是什么让Postgres创建了两个不同的查询计划.

我可以做些什么来让Postgres使用FAST查询计划，而不管scans."createdAt"上的过滤器是什么？

编辑(29/03/2023):

我try 添加一个超过("applicationId", "createdAt", "id")的索引，但不幸的是，这并没有使查询更具可预测性:

查询速度慢:

QUERY PLAN
GroupAggregate  (cost=299278.22..305644.21 rows=818 width=24) (actual time=1122.330..1238.101 rows=3 loops=1)
  Group Key: scan_exchanges."scanId"
  ->  Gather Merge  (cost=299278.22..305366.87 rows=53423 width=24) (actual time=1121.841..1237.594 rows=1414 loops=1)
        Workers Planned: 1
        Workers Launched: 1
        ->  Sort  (cost=298278.21..298356.78 rows=31425 width=24) (actual time=1078.875..1078.947 rows=707 loops=2)
              Sort Key: scan_exchanges."scanId"
              Sort Method: quicksort  Memory: 101kB
              Worker 0:  Sort Method: quicksort  Memory: 59kB
              ->  Hash Join  (cost=10.91..295930.82 rows=31425 width=24) (actual time=132.823..1078.481 rows=707 loops=2)
                    Hash Cond: (scan_exchanges."scanId" = scans.id)
                    ->  Parallel Seq Scan on scan_exchanges  (cost=0.00..294275.33 rows=624733 width=24) (actual time=17.585..986.391 rows=535626 loops=2)
                    ->  Hash  (cost=9.97..9.97 rows=75 width=16) (actual time=0.093..0.095 rows=3 loops=2)
                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  Index Only Scan using app_created_id on scans  (cost=0.29..9.97 rows=75 width=16) (actual time=0.087..0.089 rows=3 loops=2)
                                Index Cond: (("applicationId" = '2ce67bbf-d740-4f4e-aaf8-33552d54e482'::uuid) AND ("createdAt" >= (now() - '27 days'::interval)) AND ("createdAt" <= now()))
                                Heap Fetches: 0
Planning Time: 0.436 ms
JIT:
  Functions: 26
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 4.929 ms, Inlining 0.000 ms, Optimization 2.230 ms, Emission 32.063 ms, Total 39.222 ms
Execution Time: 1241.326 ms

(快速查询不会改变，也不依赖于新创建的索引)

Postgresql 如何使 Postgres 优化引擎具有确定性

推荐答案

Postgresql相关问答推荐

我无法在过程中提交

使用PGx在围棋中执行多条SQL语句

GORM Golang SQL查询执行不区分大小写的搜索不起作用

在Postgres中如何连接具有相同名称列的表并更改列名称？

安装age-installation时出错

postgres hierarchy - 用祖先的值填充缺失值

为什么这里的默认格式不同？ (ISO 8601 的两种不同实现)

将 postgres 从属提升为主 node

为什么 SQLAlchemy 不创建串行列？

'active' 标志与否？

Rails 4 迁移： has_and_belongs_to_many table name

在不存在的行上有select for update块

如何创建 postgres 数据库模式的 sql 脚本？

manage.py 迁移时必须是关系 django_site 的所有者

在 postgresql 中将 bool 转换为 int

PostgreSQL：如何在用户级别设置 search_path？

Postgres 中有多少表分区太多了？

学习 PL/pgSQL 的好资源？

与 iexact 一起使用时，Django get_or_create 无法设置字段

调用 getNextException 来查看原因：如何让 Hibernate / JPA 显示异常的数据库服务器消息