我们在下表中插入30K行/S:
CREATE UNLOGGED TABLE some_data
(
ts TIMESTAMP NOT NULL DEFAULT NOW(),
a_column VARCHAR
b_column VARCHAR
c_column boolean
) PARTITION BY RANGE (ts);
CREATE INDEX ON some_data (ts);
CREATE INDEX ON some_data (a_column);
CREATE INDEX ON some_data (b_column);
CREATE INDEX ON some_data (c_column);
当无法插入行时,编写器会创建分区.这些分区有15分钟长.数据是短暂的,会定期删除.我们进行分区是因为删除已分区的表可以确保从物理上删除数据;因为我们的插入速度很快,所以从不运行清理.
此外,所有数据都被写入并不重要.我们的数据库挂载在一个tmpfs分区上,因此它位于RAM中.我们对插入物的表现很满意.
我们还有另一个进程,它使用下面的查询连续轮询上表,以查找过go 30秒内排名前b_column
位的人:
SELECT b_column, count(*) as c FROM some_data
WHERE ts >= NOW() - INTERVAL '30 SECONDS'
AND a_column = 'a_value1'
AND c_column IS NOT True
AND b_column != 'b_value4' AND b_column != ''
AND b_column != 'b_value1' AND b_column != 'b_value2'
AND b_column != 'b_value3'
GROUP BY b_column
HAVING count(*) > 15000
ORDER BY c DESC
LIMIT 100;
此查询的WHERE
子句的列值始终相同.
由于我们要查询的数据量很大,查询需要1.5秒到2秒才能完成.在重构表或查询方面,我们可以做些什么来提高性能?
以下是分析输出:
Limit (cost=26488.73..26488.90 rows=67 width=30) (actual time=1210.177..1216.418 rows=4 loops=1)
Output: some_data.b_column, (count(*))
-> Sort (cost=26488.73..26488.90 rows=67 width=30) (actual time=1210.175..1216.415 rows=4 loops=1)
Output: some_data.b_column, (count(*))
Sort Key: (count(*)) DESC
Sort Method: quicksort Memory: 25kB
-> Finalize GroupAggregate (cost=26435.53..26486.70 rows=67 width=30) (actual time=1040.285..1216.384 rows=4 loops=1)
Output: some_data.b_column, count(*)
Group Key: some_data.b_column
Filter: (count(*) > 15000)
Rows Removed by Filter: 180403
-> Gather Merge (cost=26435.53..26482.20 rows=400 width=30) (actual time=1003.876..1136.791 rows=236999 loops=1)
Output: some_data.b_column, (PARTIAL count(*))
Workers Planned: 2
Workers Launched: 1
-> Sort (cost=25435.51..25436.01 rows=200 width=30) (actual time=977.874..1037.332 rows=118500 loops=2)
Output: some_data.b_column, (PARTIAL count(*))
Sort Key: some_data.b_column
Sort Method: external merge Disk: 5144kB
Worker 0: actual time=994.460..1056.444 rows=115752 loops=1
Sort Method: external merge Disk: 4912kB
-> Partial HashAggregate (cost=25425.86..25427.86 rows=200 width=30) (actual time=641.936..727.838 rows=118500 loops=2)
Output: some_data.b_column, PARTIAL count(*)
Group Key: some_data.b_column
Batches: 5 Memory Usage: 8257kB Disk Usage: 7184kB
Worker 0: actual time=637.586..726.780 rows=115752 loops=1
Batches: 5 Memory Usage: 8257kB Disk Usage: 7064kB
-> Parallel Index Scan using some_data_2024_02_13_14_15_ts_idx on my_db.some_data_2024_02_13_14_15 some_data (cost=0.43..24392.95 rows=206582 width=22) (actual time=0.074..463.918 rows=382730 loops=2)
Output: some_data.b_column
Index Cond: (some_data.ts >= (now() - '00:00:30'::interval))
Filter: ((some_data.c_column IS NOT TRUE) AND ((some_data.b_column)::text <> 'b_value4'::text) AND ((some_data.b_column)::text <> ''::text) AND ((some_data.b_column)::text <> 'b_value1'::text) AND ((some_data.b_column)::text <> 'b_value2'::text) AND ((some_data.b_column)::text <> 'b_value3'::text) AND (some_data.a_column = 'a_value1'::a_column))
Rows Removed by Filter: 8091
Worker 0: actual time=0.097..466.999 rows=369674 loops=1
Planning Time: 1.563 ms
Execution Time: 1219.166 ms