提前感谢任何可能对此有所帮助的人.
我有一个Postgres 15.2数据库,它使用分区表通过区域ID访问外部数据工作者表.每个区域进一步按时段细分每个区域数据库中的数据.
我正在开发的数据库接近10亿条记录,但会变得更大.如果我直接从Lead数据库查询FDW表,它的速度非常快,大约为12ms.如果我查询指向相同位置的Lead数据库分区表,总是需要大约一分钟的时间.如果我在WHERE子句中指定区域,这是分区的判断约束,它仍然很慢.
我在这里和其他地方花了几天时间寻找解决方案,但到目前为止一直在寻找解决方案.我还try 简化相关表的模式,以确定它是否与外键或约束相关,但情况似乎并非如此.
在新创建的数据库或我一直使用的长期运行的测试系统上,似乎也会发生同样的情况.
分析分区表似乎没有任何效果.
以下是我用于最小测试设置的Linux bash脚本,它在一台机器上以小得多的规模重现了该行为.它创建一个 bootstrap 者数据库和任意数量的区域数据库,并连接所有外部数据工作者.销售线索数据库位于127.0.0.100,每个区域数据库位于127.0.0.100+区域编号.我必须从postgres用户有权访问的根级目录执行此操作.注意:它会启动所有数据库,我不建议运行它,除非是在带有固态硬盘的虚拟机和空文件夹中.我给出了下面的结果.
cat BUILD_LEAD_AND_ZONES.sh
#!/usr/bin/env bash
#ARG 1 DBNAME
#ARG 2 a directory that postgres user has permissions to so DBs can be started there
#ARG 3 NUMBER OF ZONES TO CREATE ALL DBs will be created in the ARG 2 directory with a 127.0.0.X addres where Leader is at 100 and each zone is at 100 + zone number. Can move them somewhere else afterwards
set -e #exit on any non-zero return-code
set -x #echo commands
set -euo pipefail
IFS=$'\n\t'
DBNAME=$1
ROOT_TRGT_DIR=$2
NUMBER_OF_ZONES=$3
BASE_IP4="127.0.0." #with a trailing dot x.x.x.
SCHEMA_NAME="wha"
DBFOLDER="DB_SCHEMA"
FDWPASSWORD="Wh00p33"
createDbAndBaseTables () {
echo "Creating Database $1 $2"
psql -h $1 -U postgres -c "CREATE DATABASE \"$2\";"
psql -h $1 -U postgres -d $2 -c "CREATE SCHEMA $SCHEMA_NAME;"
if [[ $3 = "LEAD" ]]
then
echo "LEAD Testing.sql"
PGOPTIONS="--search_path=$SCHEMA_NAME" psql -h $1 -U postgres -d $2 <<SCRIPT
CREATE TABLE "Testing" (
"TestingId" BIGSERIAL
,"Period" INT
,"Zone" INT
,"UTCStart" timestamp NOT NULL
,"UTCEnd" timestamp NOT NULL
,"Stuff" TEXT
) PARTITION BY LIST ("Zone");
SCRIPT
else
echo "ZONE Testing.sql"
PGOPTIONS="--search_path=$SCHEMA_NAME" psql -h $1 -U postgres -d $2 <<SCRIPT
CREATE TABLE "Testing" (
"TestingId" BIGSERIAL
,"Period" INT
,"Zone" INT
,"UTCStart" timestamp NOT NULL
,"UTCEnd" timestamp NOT NULL
,"Stuff" text
,PRIMARY KEY ("TestingId","Period","Zone")
);
SCRIPT
fi
}
#Create each Lead and Zone and customize their pg_hba.conf and postgresql.conf to run and allow access from the correct addresses
for ((i=0; i <= $NUMBER_OF_ZONES; i++ ))
do
((ip=$i+100))
if [[ $i = 0 ]]; then
echo "Creating LEADER DB $BASE_IP4$ip"
DIR="$ROOT_TRGT_DIR/PSQL_LEAD"
else
echo "Creating ZONE DB $BASE_IP4$ip"
DIR="$ROOT_TRGT_DIR/PSQL_$i"
fi
mkdir -p "$DIR"
sudo chown postgres:postgres "$DIR"
sudo -u postgres initdb -D "$DIR"
sudo -u postgres mkdir -p "$DIR/sockets"
if [[ $i = 0 ]]; then
#BUILD UP pg_hba.conf access list
ACCESS_LIST="host all fdwuser $BASE_IP4$ip\/32 password"
ACCESS_LIST="$ACCESS_LIST\nhost all postgres $BASE_IP4$ip\/32 trust"
ACCESS_LIST="$ACCESS_LIST\nhost all postgres ${BASE_IP4}1\/32 trust"
ACCESS_LIST="$ACCESS_LIST\nhost all postgres 127.0.0.1\/32 trust"
ACCESS_LIST="$ACCESS_LIST\nhost all fdwuser 127.0.0.1\/32 password"
ACCESS_LIST="$ACCESS_LIST\nhost all fdwuser ${BASE_IP4}1\/32 password"
for ((k=1; k<= $NUMBER_OF_ZONES; k++ ))
do
((zip=$k+100))
ACCESS_LIST="$ACCESS_LIST\nhost all fdwuser $BASE_IP4$zip\/32 password"
done
sed -i "s/host all all 127.0.0.1\/32 trust/$ACCESS_LIST\n/" "$DIR/pg_hba.conf"
echo "sudo -u postgres pg_ctl -D \"$DIR\" -l \"$DIR/psql.log\" start" > "start_PSQL_LEAD"
chmod +x "start_PSQL_LEAD"
else
ACCESS_LIST="host all fdwuser $BASE_IP4$ip\/32 password"
ACCESS_LIST="$ACCESS_LIST\nhost all postgres $BASE_IP4$ip\/32 trust"
ACCESS_LIST="$ACCESS_LIST\nhost all postgres 127.0.0.1\/32 trust"
ACCESS_LIST="$ACCESS_LIST\nhost all fdwuser ${BASE_IP4}1\/32 password"
ACCESS_LIST="$ACCESS_LIST\nhost all fdwuser 127.0.0.1\/32 password"
ACCESS_LIST="$ACCESS_LIST\nhost all postgres ${BASE_IP4}1\/32 trust"
sed -i "s/host all all 127.0.0.1\/32 trust/$ACCESS_LIST\n/" "$DIR/pg_hba.conf"
echo "sudo -u postgres pg_ctl -D \"$DIR\" -l \"$DIR/psql.log\" start" > "start_PSQL_$i"
chmod +x "start_PSQL_$i"
fi
sed -i "s/#listen_addresses = 'localhost'/listen_addresses = '$BASE_IP4$ip'/" "$DIR/postgresql.conf"
sed -i "s/#wal_buffers = -1/wal_buffers = 16MB/" "$DIR/postgresql.conf"
#wacky ${DIR....} is escaping the front slashes so sed doesn't blow up
sed -i "s/#unix_socket_directories = '\/run\/postgresql'/unix_socket_directories = '${DIR//\//\\/}\/sockets'/" "$DIR/postgresql.conf"
done
#start servers and build db instances
for ((i=0; i <= $NUMBER_OF_ZONES; i++))
do
((ip=$i+100))
if [[ $i = 0 ]]
then
./start_PSQL_LEAD
createDbAndBaseTables "$BASE_IP4$ip" "$DBNAME" "LEAD"
else
./start_PSQL_$i
createDbAndBaseTables "$BASE_IP4$ip" "$DBNAME" "ZONE"
fi
done
#set up user for lead to access each zone with
for ((i=0; i <= $NUMBER_OF_ZONES; i++))
do
((ip=$i+100))
psql -h ${BASE_IP4}$ip -U postgres -d $DBNAME <<SCRIPT
CREATE EXTENSION postgres_fdw;
CREATE USER fdwuser WITH ENCRYPTED PASSWORD '$FDWPASSWORD';
GRANT ALL PRIVILEGES ON SCHEMA $SCHEMA_NAME to fdwuser;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA $SCHEMA_NAME to fdwuser;
ALTER TABLE "Testing" ALTER COLUMN "Zone" SET DEFAULT $k;
SCRIPT
done
for ((i=0; i <= $NUMBER_OF_ZONES; i++))
do
((ip=$i+100))
if [[ $i = 0 ]] #LEAD LEAD LEAD LEAD LEAD
then
# PGOPTIONS="--search_path=$SCHEMA_NAME" psql -U postgres -d $DBNAME -h $BASE_IP4$ip -f "$DBFOLDER/TBL/" ... ...Create lead schema here
# PGOPTIONS="--search_path=$SCHEMA_NAME" psql -U postgres -d $DBNAME -h $BASE_IP4$ip -f "$DBFOLDER/PROC/" ... ...Create lead schema here
# PGOPTIONS="--search_path=$SCHEMA_NAME" psql -U postgres -d $DBNAME -h $BASE_IP4$ip -f "$DBFOLDER/VIEW/" ... ...Create lead schema here
for ((k=1; k<= $NUMBER_OF_ZONES; k++ ))
do
((zip=$k+100))
psql -h ${BASE_IP4}100 -U postgres -d $DBNAME <<SCRIPT
CREATE SERVER "Zone$k" FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname '$DBNAME', host '$BASE_IP4$zip', port '5432',use_remote_estimate 'true', fetch_size '50000');
GRANT USAGE ON FOREIGN SERVER "Zone$k" TO fdwuser;
CREATE USER MAPPING for fdwuser SERVER "Zone$k" OPTIONS (user 'fdwuser', password '$FDWPASSWORD');
CREATE USER MAPPING for postgres SERVER "Zone$k" OPTIONS (user 'fdwuser', password '$FDWPASSWORD');
CREATE SCHEMA "Zone$k";
GRANT ALL PRIVILEGES ON SCHEMA "Zone$k" to fdwuser;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA "Zone$k" to fdwuser;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA $SCHEMA_NAME TO fdwuser;
SCRIPT
PGOPTIONS="--search_path=$SCHEMA_NAME" PGPASSWORD=$FDWPASSWORD psql -h "${BASE_IP4}100" -U postgres -d $DBNAME -c "IMPORT FOREIGN SCHEMA $SCHEMA_NAME FROM SERVER \"Zone$k\" INTO \"Zone$k\";"
PGOPTIONS="--search_path=$SCHEMA_NAME" PGPASSWORD=$FDWPASSWORD psql -h "${BASE_IP4}100" -U postgres -d $DBNAME -c "ALTER TABLE \"Testing\" ATTACH PARTITION \"Zone$k\".\"Testing\" FOR VALUES IN ($k);"
done
else #ZONE ZONE ZONE ZONE ZONE
#create a mockup of how zones are setup
for ((k=1; k<= 12; k++ ))
do
printf -v MM "%02d" $k
PGOPTIONS="--search_path=$SCHEMA_NAME" psql -h $BASE_IP4$ip -U postgres -d $DBNAME <<SCRIPT
CREATE TABLE "Testing2022$k" (CHECK ("Period"='2022$k')) INHERITS("Testing");
ALTER TABLE "Testing2022$k" ALTER COLUMN "Zone" SET DEFAULT $i;
INSERT INTO "Testing2022$k"("Period","UTCStart","UTCEnd","Stuff")
SELECT '2022$k','2022-${MM}-01 00:00:00'::timestamp + (seq || ' Minute')::INTERVAL, '2022-${MM}-01 00:00:00'::timestamp + (seq || ' Minute')::INTERVAL + ('50 Second')::INTERVAL, seq::text from generate_series(0,100000) seq;
CREATE INDEX idx_t2022${k}_utce_utcs ON "Testing2022${k}"("UTCEnd","UTCStart");
CREATE INDEX idx_t2022${k}_utcs_utce ON "Testing2022${k}"("UTCStart","UTCEnd");
CREATE INDEX idx_t2022${k}_a ON "Testing2022${k}"("Stuff");
SCRIPT
done
#
fi
done
下面是Lead数据库的结果大小,其中有8个区域,每个区域有12个周期
PGOPTIONS="--search-path=wha" psql -U postgres -d TestDB -h 127.0.0.100
个
select count(*) from "Testing";
count
---------
9600096
(1 row)
直接查询速度相当快,约为14ms
select * from "Zone6"."Testing" where "Zone" = 6 AND "UTCStart" > '2022-03-01' ORDER BY "UTCStart" DESC limit 100;
...
1199913 | 202212 | 6 | 2023-02-08 09:01:00 | 2023-02-08 09:01:50 | 99901
(100 rows)
Time: 13.564 ms
查询分区的速度要慢得多,大约为87ms(但仍然很快,因为数据集很小)
select * from wha."Testing" where "Zone" = 6 AND "UTCStart" > '2022-03-01' ORDER BY "UTCStart" DESC limit 100;
...
1199913 | 202212 | 6 | 2023-02-08 09:01:00 | 2023-02-08 09:01:50 | 99901
(100 rows)
Time: 87.156 ms
以下是解释:
来自Lead分区表
explain (ANALYZE, TIMING, COSTS, BUFFERS, VERBOSE) select * from wha."Testing" where "Zone" = 6 AND "UTCStart" > '2022-03-01' ORDER BY "UTCStart" DESC limit 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=105.26..115.26 rows=100 width=64) (actual time=75.312..75.321 rows=100 loops=1)
Output: "Testing"."TestingId", "Testing"."Period", "Testing"."Zone", "Testing"."UTCStart", "Testing"."UTCEnd", "Testing"."Stuff"
-> Foreign Scan on "Zone6"."Testing" (cost=105.26..107582.12 rows=1074326 width=37) (actual time=75.309..75.314 rows=100 loops=1)
Output: "Testing"."TestingId", "Testing"."Period", "Testing"."Zone", "Testing"."UTCStart", "Testing"."UTCEnd", "Testing"."Stuff"
Remote SQL: SELECT "TestingId", "Period", "Zone", "UTCStart", "UTCEnd", "Stuff" FROM wha."Testing" WHERE (("UTCStart" > '2022-03-01 00:00:00')) AND (("Zone" = 6)) ORDER BY "UTCStart" DESC NULLS FIRST
Planning Time: 5.638 ms
Execution Time: 76.320 ms
(7 rows)
Time: 82.503 ms
和区域直接
explain (ANALYZE, TIMING, COSTS, BUFFERS, VERBOSE) select * from "Zone6"."Testing" where "Zone" = 6 AND "UTCStart" > '2022-03-01' ORDER BY "UTCStart" DESC limit 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Foreign Scan on "Zone6"."Testing" (cost=105.26..115.26 rows=100 width=64) (actual time=3.935..3.952 rows=100 loops=1)
Output: "TestingId", "Period", "Zone", "UTCStart", "UTCEnd", "Stuff"
Remote SQL: SELECT "TestingId", "Period", "Zone", "UTCStart", "UTCEnd", "Stuff" FROM wha."Testing" WHERE (("UTCStart" > '2022-03-01 00:00:00')) AND (("Zone" = 6)) ORDER BY "UTCStart" DESC NULLS FIRST LIMIT 100::bigint
Planning Time: 8.193 ms
Execution Time: 4.277 ms
(5 rows)
Time: 13.373 ms
我不太擅长解析解释输出,但为什么在上面的两个解释上,"Zone6"上的外来扫描有这么大的不同?
我知道我可以编写一个函数来考虑区域并避免分区表,但逻辑将变得足够复杂,我正在试图避免它.为什么分区 Select 要慢得多,我是不是漏掉了什么?我还可以try 其他什么方法来加快分区的查询速度吗?
谢谢!