近期有人提出想查看Postgresql的执行计划,下面分析下PG执行计划中的cost等相关值是怎么计算出来的:
PG的版本是9.1.2
1.终端工具PGADMIN,对执行的语句按F7即可,然后看数据输出和解释
2.命令行分析:explain select * from table_name;
一般我们会比较关注消耗值cost和扫描的方式,如走索引或者full scan全表扫描.当COST值消耗比较大时需要注意是否有优化的可能。
与执行计划相关的几个参数,参看下面的示例:
kenyon=# select count(1) from dba.website ; --普通堆栈表,无任何索引约束
count
-------
20
(1 row)
kenyon=# explain select * from dba.website ;
QUERY PLAN
--------------------------------------------------------
Seq Scan on website (cost=0.00..1.20 rows=20 width=4)
(1 row)
--relpages磁盘页,reltuples是行数(与实际不一定相符,一般略小)
kenyon=# select relpages,reltuples from pg_class where relname = 'website';
relpages | reltuples
----------+-----------
1 | 20
(1 row)
kenyon=# select 1*1+20*0.01;
--cost = relpages * seq_page_cost + reltuples * cpu_tuple_cost
?column?
----------
1.20
(1 row)
kenyon=# show cpu_tuple_cost ;
cpu_tuple_cost
----------------
0.01
(1 row)
kenyon=# show seq_page_cost;
seq_page_cost
---------------
1
(1 row)
--加限制条件的执行计划
kenyon=# select count(1) from dba.website where hits >15;
count
-------
5
(1 row)
kenyon=# explain select * from dba.website where hits >15;
QUERY PLAN
-------------------------------------------------------
Seq Scan on website (cost=0.00..1.25 rows=5 width=4)
Filter: (hits > 15)
(2 rows)
kenyon=# show cpu_operator_cost ;
cpu_operator_cost
-------------------
0.0025
(1 row)
因为扫描的总数是20行,不变的,所以COST不会下降,相反反而增加了0.05,这是因为额外消耗了CPU的时间去检查符合约束条件数据,即cost 在原来的基础上再增加 20 * 0.0025 = 0.05 (reltuples * cpu_operator_cost)
--加索引的执行计划
kenyon=# select count(1) from dba.website_2 ;
count
-------
8000
(1 row)
kenyon=# explain select * from dba.website_2 ;
QUERY PLAN
--------------------------------------------------------------
Seq Scan on website_2 (cost=0.00..112.00 rows=8000 width=4)
(1 row)
kenyon=# select relpages,reltuples from pg_class where relname = 'website_2';
relpages | reltuples
----------+-----------
32 | 8000
(1 row)
kenyon=# explain select * from dba.website_2 where hits >7900; --走的索引
QUERY PLAN
----------------------------------------------------------------------------------
Index Scan using ind_website_2 on website_2 (cost=0.00..10.00 rows=100 width=4)
Index Cond: (hits > 7900)
(2 rows)
()
kenyon=# explain select * from dba.website_2 where hits >10; --未走索引(不满足索引条件,full scan)
QUERY PLAN
--------------------------------------------------------------
Seq Scan on website_2 (cost=0.00..132.00 rows=7991 width=4) -- 132 = 112+8000*0.0025
Filter: (hits > 10)
(2 rows)
虽然读取的COST更大,但是因为索引的缘故,访问的数据量变小了,所以总体COST是下降的。
--多表JOIN的执行计划 示例: 若想看实际的一个执行时间,可以加上 analyze 参数
kenyon=# explain analyze select * from dba.website a ,dba.website_2 b where a.hits = b.hits and a.hits >18;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=1.26..1.90 rows=2 width=8) (actual time=0.070..0.075 rows=2 loops=1)
Merge Cond: (b.hits = a.hits)
-> Index Scan using ind_website_2 on website_2 b (cost=0.00..235.25 rows=8000 width=4) (actual time=0.013..0.020 rows=21 loops=1)
-> Sort (cost=1.26..1.26 rows=2 width=4) (actual time=0.035..0.037 rows=2 loops=1)
Sort Key: a.hits
Sort Method: quicksort Memory: 17kB
-> Seq Scan on website a (cost=0.00..1.25 rows=2 width=4) (actual time=0.009..0.011 rows=2 loops=1)
Filter: (hits > 18)
Total runtime : 0.120 ms
(9 rows)
total runtime 是执行器启动和关闭的时间,但不包括解析,重写和规划的时间
注意: pg_class中的relpages,reltuples数据不是实时更新的,一般在vacuum analyze和少部分DDL(如建立索引)后更新。
示例1:
kenyon=# insert into dba.website select generate_series(8000,9000);
INSERT 0 1001
kenyon=# select relpages,reltuples,relname,relkind from pg_class where relname like '%website%';
relpages | reltuples | relname | relkind
----------+-----------+---------------+---------
1 | 20 | website | r
32 | 8000 | website_2 | r
20 | 8000 | ind_website_2 | i
(3 rows)
kenyon=# vacuum analyze dba.website;
VACUUM
kenyon=# vacuum analyze dba.website;
VACUUM
kenyon=# select relpages,reltuples,relname,relkind from pg_class where relname like '%website%';
relpages | reltuples | relname | relkind
----------+-----------+---------------+---------
5 | 1021 | website | r
36 | 8999 | website_2 | r
22 | 8999 | ind_website_2 | i
(3 rows)
示例2:
kenyon=# insert into dba.website select generate_series(8000,9000);
INSERT 0 1001
kenyon=# select relpages,reltuples,relname,relkind from pg_class where relname like '%website%';
relpages | reltuples | relname | relkind
----------+-----------+---------------+---------
1 | 21 | website | r
36 | 8999 | website_2 | r
22 | 8999 | ind_website_2 | i
(3 rows)
kenyon=# create index ind_website on dba.website(hits);
CREATE INDEX
kenyon=# select relpages,reltuples,relname,relkind from pg_class where relname like '%website%';
relpages | reltuples | relname | relkind
----------+-----------+---------------+---------
5 | 1022 | website | r
36 | 8999 | website_2 | r
22 | 8999 | ind_website_2 | i
5 | 1022 | ind_website | i
(4 rows)
所涉及的系统表:
pg_stats
pg_statistic
pg_class
pg_stat是任何人都可以看的,而且可读性高,比较直观,pg_statistic只有superuser才能读,并且可读性差,普通人员建议看pg_stats,pg_stats是pg_statistic的视图。 这两个表也不是实时更新的,需要vacuum analyze时会更新
所涉及的系统变量:
default_statistics_target
geqo_threshold
join_collapse_limit
from_collapse_limit
kenyon=# show default_statistics_target ;
default_statistics_target
---------------------------
100
(1 row)
kenyon=# show geqo_threshold ; --这个参数的大小会设置执行计划从穷举搜索到概率选择性搜索的临界值
geqo_threshold
----------------
12
(1 row)
kenyon=# show join_collapse_limit ; --join连接走执行计划上限
join_collapse_limit
---------------------
8
(1 row)
kenyon=# show from_collapse_limit ;
from_collapse_limit
---------------------
8
(1 row)
EXPLAIN
Name
EXPLAIN— show the execution plan of a statement
Synopsis
EXPLAIN [ ( option [, ...] ) ] statement
EXPLAIN [ ANALYZE ] [ VERBOSE ] statement
where option can be one of:
ANALYZE [ boolean ]
VERBOSE [ boolean ]
COSTS [ boolean ]
BUFFERS [ boolean ]
FORMAT { TEXT | XML | JSON | YAML }
例子:
kenyon=# explain (analyze,verbose,costs,buffers) select id from dba.test222 order by id desc limit 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1807.80..1807.80 rows=1 width=4) (actual time=87.167..87.168 rows=1 loops=1)
Output: id
Buffers: shared hit=393
-> Sort (cost=1807.80..2043.60 rows=94320 width=4) (actual time=87.165..87.165 rows=1 loops=1)
Output: id
Sort Key: test222.id
Sort Method: top-N heapsort Memory: 17kB
Buffers: shared hit=393
-> Seq Scan on dba.test222 (cost=0.00..1336.20 rows=94320 width=4) (actual time=0.036..42.847 rows=100000 loops=1)
Output: id
Buffers: shared hit=393
Total runtime: 87.183 ms
(12 rows)
kenyon=# explain (analyze,verbose,costs,buffers) select max(id) from dba.test222;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1572.00..1572.01 rows=1 width=4) (actual time=77.679..77.680 rows=1 loops=1)
Output: max(id)
Buffers: shared hit=393
-> Seq Scan on dba.test222 (cost=0.00..1336.20 rows=94320 width=4) (actual time=0.012..36.908 rows=100000 loops=1)
Output: id
Buffers: shared hit=393
Total runtime: 77.701 ms
(7 rows)
explain参数解释:
ANALYZE :执行命令并显示执行事件,默认false
VERBOSE :对执行计划提供额外的信息,如查询字段信息等,默认false
COSTS :显示执行计划的,默认true
BUFFERS :默认false,前置条件是analyze
FORMAT :默认格式是text