提前感谢任何可能对此有所帮助的人.

我有一个Postgres 15.2数据库,它使用分区表通过区域ID访问外部数据工作者表.每个区域进一步按时段细分每个区域数据库中的数据.

我正在开发的数据库接近10亿条记录,但会变得更大.如果我直接从Lead数据库查询FDW表,它的速度非常快,大约为12ms.如果我查询指向相同位置的Lead数据库分区表,总是需要大约一分钟的时间.如果我在WHERE子句中指定区域,这是分区的判断约束,它仍然很慢.

我在这里和其他地方花了几天时间寻找解决方案,但到目前为止一直在寻找解决方案.我还try 简化相关表的模式,以确定它是否与外键或约束相关,但情况似乎并非如此.

在新创建的数据库或我一直使用的长期运行的测试系统上,似乎也会发生同样的情况.

分析分区表似乎没有任何效果.

以下是我用于最小测试设置的Linux bash脚本,它在一台机器上以小得多的规模重现了该行为.它创建一个 bootstrap 者数据库和任意数量的区域数据库,并连接所有外部数据工作者.销售线索数据库位于127.0.0.100,每个区域数据库位于127.0.0.100+区域编号.我必须从postgres用户有权访问的根级目录执行此操作.注意:它会启动所有数据库,我不建议运行它,除非是在带有固态硬盘的虚拟机和空文件夹中.我给出了下面的结果.

cat BUILD_LEAD_AND_ZONES.sh 
#!/usr/bin/env bash
#ARG 1 DBNAME
#ARG 2 a directory that postgres user has permissions to so DBs can be started there
#ARG 3 NUMBER OF ZONES TO CREATE  ALL DBs will be created in the ARG 2 directory with a 127.0.0.X addres where Leader is at 100 and each zone is at 100 + zone number. Can move them somewhere else afterwards

set -e #exit on any non-zero return-code
set -x #echo commands

set -euo pipefail
IFS=$'\n\t'


DBNAME=$1
ROOT_TRGT_DIR=$2
NUMBER_OF_ZONES=$3
BASE_IP4="127.0.0." #with a trailing dot x.x.x.
SCHEMA_NAME="wha"
DBFOLDER="DB_SCHEMA"
FDWPASSWORD="Wh00p33"

createDbAndBaseTables () {
    echo "Creating Database $1 $2"
    psql -h $1 -U postgres -c "CREATE DATABASE \"$2\";"
    psql -h $1 -U postgres -d $2 -c "CREATE SCHEMA $SCHEMA_NAME;"
    if [[ $3 = "LEAD" ]]
    then
        echo "LEAD Testing.sql"
        PGOPTIONS="--search_path=$SCHEMA_NAME" psql -h $1 -U postgres -d $2 <<SCRIPT
CREATE TABLE "Testing" (
    "TestingId" BIGSERIAL
    ,"Period" INT
    ,"Zone" INT
    ,"UTCStart" timestamp NOT NULL
    ,"UTCEnd" timestamp NOT NULL
    ,"Stuff" TEXT
) PARTITION BY LIST ("Zone");
SCRIPT
    else
        echo "ZONE Testing.sql"
        PGOPTIONS="--search_path=$SCHEMA_NAME" psql -h $1 -U postgres -d $2 <<SCRIPT
CREATE TABLE "Testing" (
    "TestingId" BIGSERIAL
    ,"Period" INT
    ,"Zone" INT
    ,"UTCStart" timestamp NOT NULL
    ,"UTCEnd" timestamp NOT NULL
    ,"Stuff" text
    ,PRIMARY KEY ("TestingId","Period","Zone")
);
SCRIPT
    fi
}


#Create each Lead and Zone and customize their pg_hba.conf and postgresql.conf to run and allow access from the correct addresses
for ((i=0; i <= $NUMBER_OF_ZONES; i++ )) 
do
    ((ip=$i+100))
    if [[ $i = 0 ]]; then
        echo "Creating LEADER DB $BASE_IP4$ip"
        DIR="$ROOT_TRGT_DIR/PSQL_LEAD"
    else
        echo "Creating ZONE DB $BASE_IP4$ip"
        DIR="$ROOT_TRGT_DIR/PSQL_$i"
    fi
    mkdir -p "$DIR"
    sudo chown postgres:postgres "$DIR"
    sudo -u postgres initdb -D "$DIR"
    sudo -u postgres mkdir -p "$DIR/sockets"
    if [[ $i = 0 ]]; then
        #BUILD UP pg_hba.conf access list
        ACCESS_LIST="host    all             fdwuser         $BASE_IP4$ip\/32        password"
        ACCESS_LIST="$ACCESS_LIST\nhost    all             postgres        $BASE_IP4$ip\/32        trust"
        ACCESS_LIST="$ACCESS_LIST\nhost    all             postgres        ${BASE_IP4}1\/32        trust"
        ACCESS_LIST="$ACCESS_LIST\nhost    all             postgres        127.0.0.1\/32        trust"
        ACCESS_LIST="$ACCESS_LIST\nhost    all             fdwuser         127.0.0.1\/32        password"
        ACCESS_LIST="$ACCESS_LIST\nhost    all             fdwuser        ${BASE_IP4}1\/32      password"
        for ((k=1; k<= $NUMBER_OF_ZONES; k++ ))
        do
            ((zip=$k+100))
            ACCESS_LIST="$ACCESS_LIST\nhost    all             fdwuser         $BASE_IP4$zip\/32        password"
        done
        sed -i "s/host    all             all             127.0.0.1\/32            trust/$ACCESS_LIST\n/" "$DIR/pg_hba.conf"
        echo "sudo -u postgres pg_ctl -D \"$DIR\" -l \"$DIR/psql.log\" start" > "start_PSQL_LEAD"
        chmod +x "start_PSQL_LEAD"
    else
        ACCESS_LIST="host    all            fdwuser          $BASE_IP4$ip\/32        password"
        ACCESS_LIST="$ACCESS_LIST\nhost    all            postgres         $BASE_IP4$ip\/32        trust"
        ACCESS_LIST="$ACCESS_LIST\nhost    all            postgres         127.0.0.1\/32        trust"
        ACCESS_LIST="$ACCESS_LIST\nhost    all            fdwuser          ${BASE_IP4}1\/32        password"
        ACCESS_LIST="$ACCESS_LIST\nhost    all            fdwuser          127.0.0.1\/32        password"
        ACCESS_LIST="$ACCESS_LIST\nhost    all            postgres         ${BASE_IP4}1\/32        trust"
        sed -i "s/host    all             all             127.0.0.1\/32            trust/$ACCESS_LIST\n/" "$DIR/pg_hba.conf"
        echo "sudo -u postgres pg_ctl -D \"$DIR\" -l \"$DIR/psql.log\" start" > "start_PSQL_$i"
        chmod +x "start_PSQL_$i"
    fi
    sed -i "s/#listen_addresses = 'localhost'/listen_addresses = '$BASE_IP4$ip'/" "$DIR/postgresql.conf"
    sed -i "s/#wal_buffers = -1/wal_buffers = 16MB/" "$DIR/postgresql.conf"
    #wacky ${DIR....} is escaping the front slashes so sed doesn't blow up
    sed -i "s/#unix_socket_directories = '\/run\/postgresql'/unix_socket_directories = '${DIR//\//\\/}\/sockets'/" "$DIR/postgresql.conf"
done

#start servers and build db instances
for ((i=0; i <= $NUMBER_OF_ZONES; i++))
do
    ((ip=$i+100))
    if [[ $i = 0 ]]
    then
        ./start_PSQL_LEAD
        createDbAndBaseTables "$BASE_IP4$ip" "$DBNAME" "LEAD"
    else
        ./start_PSQL_$i
        createDbAndBaseTables "$BASE_IP4$ip" "$DBNAME" "ZONE"
    fi
done

#set up user for lead to access each zone with
for ((i=0; i <= $NUMBER_OF_ZONES; i++))
do
    ((ip=$i+100))
psql -h ${BASE_IP4}$ip -U postgres -d $DBNAME <<SCRIPT
 CREATE EXTENSION postgres_fdw;
 CREATE USER fdwuser WITH ENCRYPTED PASSWORD '$FDWPASSWORD';
 GRANT ALL PRIVILEGES ON SCHEMA $SCHEMA_NAME to fdwuser;
 GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA $SCHEMA_NAME to fdwuser;
 ALTER TABLE "Testing" ALTER COLUMN "Zone" SET DEFAULT $k;
SCRIPT
done

for ((i=0; i <= $NUMBER_OF_ZONES; i++))
do
    ((ip=$i+100))
    if [[ $i = 0 ]] #LEAD LEAD LEAD LEAD LEAD
    then
#       PGOPTIONS="--search_path=$SCHEMA_NAME" psql -U postgres -d $DBNAME -h $BASE_IP4$ip -f "$DBFOLDER/TBL/" ... ...Create lead schema here
#       PGOPTIONS="--search_path=$SCHEMA_NAME" psql -U postgres -d $DBNAME -h $BASE_IP4$ip -f "$DBFOLDER/PROC/" ... ...Create lead schema here
#       PGOPTIONS="--search_path=$SCHEMA_NAME" psql -U postgres -d $DBNAME -h $BASE_IP4$ip -f "$DBFOLDER/VIEW/" ... ...Create lead schema here
        
        for ((k=1; k<= $NUMBER_OF_ZONES; k++ ))
        do
            ((zip=$k+100))
            psql -h ${BASE_IP4}100 -U postgres -d $DBNAME <<SCRIPT
 CREATE SERVER "Zone$k" FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname '$DBNAME', host '$BASE_IP4$zip', port '5432',use_remote_estimate 'true', fetch_size '50000');
 GRANT USAGE ON FOREIGN SERVER "Zone$k" TO fdwuser;
 CREATE USER MAPPING for fdwuser SERVER "Zone$k" OPTIONS (user 'fdwuser', password '$FDWPASSWORD');
 CREATE USER MAPPING for postgres SERVER "Zone$k" OPTIONS (user 'fdwuser', password '$FDWPASSWORD');
 CREATE SCHEMA "Zone$k";
 GRANT ALL PRIVILEGES ON SCHEMA "Zone$k" to fdwuser;
 GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA "Zone$k" to fdwuser;
 GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA $SCHEMA_NAME TO fdwuser;
SCRIPT
            PGOPTIONS="--search_path=$SCHEMA_NAME" PGPASSWORD=$FDWPASSWORD psql -h "${BASE_IP4}100" -U postgres -d $DBNAME -c "IMPORT FOREIGN SCHEMA $SCHEMA_NAME FROM SERVER \"Zone$k\" INTO \"Zone$k\";"
            PGOPTIONS="--search_path=$SCHEMA_NAME" PGPASSWORD=$FDWPASSWORD psql -h "${BASE_IP4}100" -U postgres -d $DBNAME -c "ALTER TABLE \"Testing\" ATTACH PARTITION \"Zone$k\".\"Testing\" FOR VALUES IN ($k);"

        done

    else #ZONE ZONE ZONE ZONE ZONE

    #create a mockup of how zones are setup
    for ((k=1; k<= 12; k++ ))
    do
        printf -v MM "%02d" $k
        PGOPTIONS="--search_path=$SCHEMA_NAME" psql -h $BASE_IP4$ip -U postgres -d $DBNAME <<SCRIPT
CREATE TABLE "Testing2022$k" (CHECK ("Period"='2022$k')) INHERITS("Testing");
ALTER TABLE "Testing2022$k" ALTER COLUMN "Zone" SET DEFAULT $i;

INSERT INTO "Testing2022$k"("Period","UTCStart","UTCEnd","Stuff")
SELECT '2022$k','2022-${MM}-01 00:00:00'::timestamp + (seq || ' Minute')::INTERVAL, '2022-${MM}-01 00:00:00'::timestamp + (seq || ' Minute')::INTERVAL + ('50 Second')::INTERVAL, seq::text from generate_series(0,100000) seq;

CREATE INDEX idx_t2022${k}_utce_utcs ON "Testing2022${k}"("UTCEnd","UTCStart");
CREATE INDEX idx_t2022${k}_utcs_utce ON "Testing2022${k}"("UTCStart","UTCEnd");
CREATE INDEX idx_t2022${k}_a ON "Testing2022${k}"("Stuff");
SCRIPT


    done
#

    fi
done

下面是Lead数据库的结果大小,其中有8个区域,每个区域有12个周期 PGOPTIONS="--search-path=wha" psql -U postgres -d TestDB -h 127.0.0.100

select count(*) from "Testing";
  count  
---------
 9600096
(1 row)

直接查询速度相当快,约为14ms

select * from "Zone6"."Testing" where "Zone" = 6 AND "UTCStart" > '2022-03-01' ORDER BY "UTCStart" DESC limit 100;
...
   1199913 | 202212 |    6 | 2023-02-08 09:01:00 | 2023-02-08 09:01:50 | 99901
(100 rows)

Time: 13.564 ms

查询分区的速度要慢得多,大约为87ms(但仍然很快,因为数据集很小)

select * from wha."Testing" where "Zone" = 6 AND "UTCStart" > '2022-03-01' ORDER BY "UTCStart" DESC limit 100;

...
   1199913 | 202212 |    6 | 2023-02-08 09:01:00 | 2023-02-08 09:01:50 | 99901
(100 rows)

Time: 87.156 ms

以下是解释:

来自Lead分区表

explain (ANALYZE, TIMING, COSTS, BUFFERS, VERBOSE) select * from wha."Testing" where "Zone" = 6 AND "UTCStart" > '2022-03-01' ORDER BY "UTCStart" DESC limit 100;
                                                                                                   QUERY PLAN                                                                                                    
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=105.26..115.26 rows=100 width=64) (actual time=75.312..75.321 rows=100 loops=1)
   Output: "Testing"."TestingId", "Testing"."Period", "Testing"."Zone", "Testing"."UTCStart", "Testing"."UTCEnd", "Testing"."Stuff"
   ->  Foreign Scan on "Zone6"."Testing"  (cost=105.26..107582.12 rows=1074326 width=37) (actual time=75.309..75.314 rows=100 loops=1)
         Output: "Testing"."TestingId", "Testing"."Period", "Testing"."Zone", "Testing"."UTCStart", "Testing"."UTCEnd", "Testing"."Stuff"
         Remote SQL: SELECT "TestingId", "Period", "Zone", "UTCStart", "UTCEnd", "Stuff" FROM wha."Testing" WHERE (("UTCStart" > '2022-03-01 00:00:00')) AND (("Zone" = 6)) ORDER BY "UTCStart" DESC NULLS FIRST
 Planning Time: 5.638 ms
 Execution Time: 76.320 ms
(7 rows)

Time: 82.503 ms

和区域直接

explain (ANALYZE, TIMING, COSTS, BUFFERS, VERBOSE) select * from "Zone6"."Testing" where "Zone" = 6 AND "UTCStart" > '2022-03-01' ORDER BY "UTCStart" DESC limit 100;
                                                                                                         QUERY PLAN                                                                                                          
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan on "Zone6"."Testing"  (cost=105.26..115.26 rows=100 width=64) (actual time=3.935..3.952 rows=100 loops=1)
   Output: "TestingId", "Period", "Zone", "UTCStart", "UTCEnd", "Stuff"
   Remote SQL: SELECT "TestingId", "Period", "Zone", "UTCStart", "UTCEnd", "Stuff" FROM wha."Testing" WHERE (("UTCStart" > '2022-03-01 00:00:00')) AND (("Zone" = 6)) ORDER BY "UTCStart" DESC NULLS FIRST LIMIT 100::bigint
 Planning Time: 8.193 ms
 Execution Time: 4.277 ms
(5 rows)

Time: 13.373 ms

我不太擅长解析解释输出,但为什么在上面的两个解释上,"Zone6"上的外来扫描有这么大的不同?

我知道我可以编写一个函数来考虑区域并避免分区表,但逻辑将变得足够复杂,我正在试图避免它.为什么分区 Select 要慢得多,我是不是漏掉了什么?我还可以try 其他什么方法来加快分区的查询速度吗?

谢谢!

推荐答案

这两个查询的不同之处在于,在第二个查询中,LIMIT 100被下推到远程数据库.这使得远程数据库 Select 能够快速提供前LIMIT 100个结果的执行计划.否则,PostgreSQL会针对返回complete结果集的最快计划进行优化.对远程数据库上的远程查询运行EXPLAIN,看看是否能检测到差异.

如果您查询分区表,PostgreSQL似乎不会向下推LIMIT.我没有仔细阅读代码来找出原因,但原则上这应该是可能的.也许它只是没有得到实施.

Postgresql相关问答推荐

DBT-DBT依赖于未找到的源

JPA将没有时区的Postgres时间戳调整为服务器时区

PostgreSQL 中的 Datum 数据类型是什么以及它的用途是什么?

如何准确确定边界附近的点和地理的 ST_Intersects(ST_Intersects geography vs. Geometry diffrepancy)

未更改表上的 Postgres 环绕预防

在 postgresql 中查找表的所有依赖项

从网址获取第一个字符串

如何在 Postgres 中的列上删除唯一约束?

在 Postgresql 中获取星期几

gem install pg 无法绑定到 libpq

psql:致命:connection requires a valid client certificate

你如何在postgresql中做mysqldump?

仅在存在时重命名列

解释结果中的Recheck Cond是什么意思?

在 Postgresql 中按窗口函数结果过滤

具有 DEFAULT NULL 的 MySQL 列 ?

如果 PostgreSQL count(*) 总是很慢,如何对复杂查询进行分页?

使用 Postgresql 数据库(OSX Lion)创建 Rails 3.1 应用程序

如何将 Heroku PG 转储导入本地机器

Rails ActiveRecord:PG::Error:错误:column reference "created_at" is ambiguous