即使在使用了GIN索引之后,我也不能让类型为column_name collate "default" ilike 'abc%'
的查询在我的Postgres 12表上有效地工作(以亚秒为单位运行).
问题以及如何重现问题:
My table:个
CREATE TABLE test_policy(
id SERIAL not null PRIMARY KEY,
policy_number varchar(255) not null COLLATE "english_ci");
insert 10000000 records个
insert into test_policy(policy_number)
select (select prefix || '/' || suffix from
(select prefix from (select string_agg(x, '') from (select
start_arr[ 1 + ( (random() * 100)::int) % 13] from (select
'{AB,CD,EF,GH,IJ,KL,MN,OP,QR,ST,UV,WX,YZ}'::text[] as start_arr)
sy, generate_series(1, 3 + (0*generator)) as g)as str2(x)) as
s(prefix)) as pre(prefix), (select suffix from (select n[1 +
(random() * 100)::int % 10] from (select '{00, 01, 02, 03, 04,
05, 06, 07, 08, 09, 10}'::text[]) as num(n)) as s(suffix)) as
suf(suffix)
) FROM generate_series (1,10000000) as generator
on conflict do nothing;
Create GIN index个
CREATE INDEX trgm_idx_policy_number ON test_policy USING gin (policy_number gin_trgm_ops);
分析下面的查询,您将看到没有 Select 索引,运行时间约为2.5秒.我想知道为什么以及如何进行此查询以获取索引.我有很多具有上述样式的表,查询使用COLLATE "default" ilike
个.
explain analyse select count(id) from test_policy where policy_number COLLATE "default" ilike 'EFWXMN%';
Output个
Finalize Aggregate (cost=107139.59..107139.60 rows=1 width=8) (actual time=2512.638..2517.028 rows=1 loops=1)
-> Gather (cost=107139.38..107139.59 rows=2 width=8) (actual time=2512.318..2517.007 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=106139.38..106139.39 rows=1 width=8) (actual time=2484.685..2484.686 rows=1 loops=3)
-> Parallel Seq Scan on test_policy (cost=0.00..106138.33 rows=417 width=4) (actual time=11.753..2483.982 rows=1489 loops=3)
Filter: ((policy_number)::text ~~* 'EFWXMN%'::text)
Rows Removed by Filter: 3331844
Planning Time: 0.424 ms
JIT:
Functions: 17
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 5.328 ms, Inlining 0.000 ms, Optimization 3.073 ms, Emission 25.693 ms, Total 34.094 ms
Execution Time: 2520.419 ms
My approach of optimizing query个
但是,如果我将表的COLUMN:POLICY_NUMBER更改为:
ALTER TABLE test_policy
ALTER COLUMN policy_number TYPE varchar ;
并再次运行类似的查询,您将注意到正在使用TRGM_IDX_POLICY_NUMBER,运行时间约为100毫秒.
explain analyse select count(id) from test_policy where policy_number ilike 'EFWXMN%';
output个
Aggregate (cost=10371.96..10371.97 rows=1 width=8) (actual time=49.658..49.660 rows=1 loops=1)
-> Bitmap Heap Scan on test_policy (cost=124.80..10363.96 rows=3200 width=4) (actual time=34.038..49.203 rows=4395 loops=1)
Recheck Cond: ((policy_number)::text ~~* 'EFWXMN%'::text)
Heap Blocks: exact=4209
-> Bitmap Index Scan on trgm_idx_policy_number (cost=0.00..124.00 rows=3200 width=0) (actual time=33.598..33.598 rows=4395 loops=1)
Index Cond: ((policy_number)::text ~~* 'EFWXMN%'::text)
Planning Time: 0.134 ms
Execution Time: 49.814 ms
您能解释一下为什么这次使用杜松子wine 索引,运行时间是亚秒级的吗?