Updated on 2024-06-07 GMT+08:00

Collecting Partition-Level Statistics

Collecting Statistics on a Specified Partition

Collecting statistics on a single partition of the current partitioned table is supported. The partitions whose statistics have been collected will be automatically updated and maintained when the statistics are collected again. This function applies to list partitioning, hash partitioning, and range partitioning.

gaussdb=# CREATE TABLE only_first_part(id int,name varchar)PARTITION BY RANGE (id) 
 (PARTITION id11 VALUES LESS THAN (1000000),
PARTITION id22 VALUES LESS THAN (2000000),
PARTITION max_id1 VALUES LESS THAN (MAXVALUE));

gaussdb=# INSERT INTO only_first_part SELECT generate_series(1,5000),'test';

gaussdb=# ANALYZE only_first_part PARTITION (id11);
gaussdb=# ANALYZE only_first_part PARTITION (id22);
gaussdb=# ANALYZE only_first_part PARTITION (max_id1);

gaussdb=# SELECT relname, relpages, reltuples FROM pg_partition WHERE relname IN ('id11', 'id22', 'max_id1');
 relname | relpages | reltuples 
---------+----------+-----------
 id11    |       20 |      5000
 id22    |        0 |         0
 max_id1 |        0 |         0
(3 rows)

gaussdb=# \x
gaussdb=# SELECT * FROM pg_stats WHERE tablename ='only_first_part' AND partitionname ='id11';
-[ RECORD 1 ]----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname             | public
tablename              | only_first_part
attname                | name
inherited              | f
null_frac              | 0
avg_width              | 5
n_distinct             | 1
n_dndistinct           | 0
most_common_vals       | {test}
most_common_freqs      | {1}
histogram_bounds       | 
correlation            | 1
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | 
partitionname          | id11
subpartitionname       | 
-[ RECORD 2 ]----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
schemaname             | public
tablename              | only_first_part
attname                | id
inherited              | f
null_frac              | 0
avg_width              | 4
n_distinct             | -1
n_dndistinct           | 0
most_common_vals       | 
most_common_freqs      | 
histogram_bounds       | {1,50,100,150,200,250,300,350,400,450,500,550,600,650,700,750,800,850,900,950,1000,1050,1100,1150,1200,1250,1300,1350,1400,1450,1500,1550,1600,1650,1700,1750,1800,1850,1900,1950,2000,2050,2100,2150,2200,2250,2300,2350,2400,2450,2500,2550,2600,2650,2700,2750,2800,2850,2900,2950,3000,3050,3100,3150,3200,3250,3300,3350,3400,3450,3500,3550,3600,3650,3700,3750,3800,3850,3900,3950,4000,4050,4100,4150,4200,4250,4300,4350,4400,4450,4500,4550,4600,4650,4700,4750,4800,4850,4900,4950,5000}
correlation            | 1
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | 
partitionname          | id11
subpartitionname       |

gaussdb=# \x
-- Delete the partitioned table.
gaussdb=# DROP TABLE only_first_part;

Optimizer Using the Statistics of a Specified Partition

The optimizer preferentially uses the statistics of the specified partition. If statistics are not collected for the specified partition, the optimizer rewrites the partition clause for pruning optimization. For details, see Rewriting a Partition Clause for Pruning Optimization.

gaussdb=# CREATE TABLE only_first_part_two
(
    c1 INT,
    c2 BIGINT
)
PARTITION BY LIST(c2)
(
    PARTITION p_1 VALUES (10000, 20000),
    PARTITION p_2 VALUES (300000, 400000, 500000),
    PARTITION p_3 VALUES (DEFAULT)
);

gaussdb=# EXPLAIN SELECT * FROM only_first_part_two PARTITION (p_2);

                                        QUERY PLAN                                        
------------------------------------------------------------------------------------------
 Partition Iterator  (cost=0.00..29.45 rows=1945 width=12)
   Iterations: 1
   ->  Partitioned Seq Scan on only_first_part_two  (cost=0.00..29.45 rows=1945 width=12)
         Selected Partitions:  2
(4 rows)

gaussdb=# EXPLAIN SELECT * FROM only_first_part_two PARTITION (p_1) WHERE c2 = 1;
                                       QUERY PLAN                                       
----------------------------------------------------------------------------------------
 Partition Iterator  (cost=0.00..34.31 rows=10 width=12)
   Iterations: 0
   ->  Partitioned Seq Scan on only_first_part_two  (cost=0.00..34.31 rows=10 width=12)
         Filter: (c2 = 1)
         Selected Partitions:  NONE
(5 rows)
-- Delete the partitioned table.
gaussdb=# DROP TABLE only_first_part_two;

Rewriting a Partition Clause for Pruning Optimization

If there is no partition-level statistics, the optimizer row quantity estimation module logically rewrites the pseudo-predicate of a partition clause, and uses the rewritten pseudo-predicate to affect the selectivity calculation and the statistics of the entire table to obtain a relatively accurate estimated row quantity.

  • This feature applies only to selectivity calculation.
  • This feature applies to level-1 and level-2 partitions.
  • The feature supports only range partitions, interval partitions, and list partitions.
  • For range partitioning and interval partitioning, only single-column partitioning keys can be rewritten. Multi-column partitioning keys cannot be rewritten.
  • For list partitioning, to ensure performance, the maximum number of enumerated values for a specified list partition is 40.
    • When the number of enumerated values for a specified list partition exceeds 40, this feature is no longer applicable.
    • For the default partition, the number of enumerated values is the total number of enumerated values in all non-default partitions.

Example 1: Rewriting a range/interval partition

gaussdb=# CREATE TABLE test_int4_maxvalue(id INT, name VARCHAR)
PARTITION BY RANGE(id) 
(
    PARTITION id1 VALUES LESS THAN(1000), 
    PARTITION id2 VALUES LESS THAN(2000),
    PARTITION max_id VALUES LESS THAN(MAXVALUE)
);
gaussdb=# INSERT INTO test_int4_maxvalue SELECT GENERATE_SERIES(1,5000),'test';
gaussdb=# ANALYZE test_int4_maxvalue WITH GLOBAL;

-- Query the specified partition id1.
gaussdb=# EXPLAIN SELECT * FROM test_int4_maxvalue PARTITION(id1);
                                       QUERY PLAN                                       
----------------------------------------------------------------------------------------
 Partition Iterator  (cost=0.00..51.00 rows=1000 width=9)
   Iterations: 1
   ->  Partitioned Seq Scan on test_int4_maxvalue  (cost=0.00..51.00 rows=1000 width=9)
         Selected Partitions:  1
(4 rows)

-- Query the max_id of the specified partition.
gaussdb=# EXPLAIN SELECT * FROM test_int4_maxvalue PARTITION(max_id);
                                       QUERY PLAN                                       
----------------------------------------------------------------------------------------
 Partition Iterator  (cost=0.00..51.00 rows=3000 width=9)
   Iterations: 1
   ->  Partitioned Seq Scan on test_int4_maxvalue  (cost=0.00..51.00 rows=3000 width=9)
         Selected Partitions:  3
(4 rows)

-- Delete the partitioned table.
gaussdb=# DROP TABLE test_int4_maxvalue;
Example 2: Rewriting a list partition
gaussdb=# CREATE TABLE test_default
(
    c1 INT, 
    c2 BIGINT
)
PARTITION BY LIST(c2) 
(
    PARTITION p_1 VALUES (10000, 20000), 
    PARTITION p_2 VALUES (300000, 400000, 500000),
    PARTITION p_3 VALUES (DEFAULT)
);
gaussdb=# INSERT INTO test_default SELECT GENERATE_SERIES(1, 1000), 10000;
gaussdb=# INSERT INTO test_default SELECT GENERATE_SERIES(1001, 2000), 600000;
gaussdb=# ANALYZE test_default WITH GLOBAL;

-- Query the specified partition p_1.
gaussdb=# EXPLAIN SELECT * FROM test_default PARTITION(p_1);
                                    QUERY PLAN                                     
-----------------------------------------------------------------------------------
 Partition Iterator  (cost=0.00..28.00 rows=1000 width=12)
   Iterations: 1
   ->  Partitioned Seq Scan on test_default  (cost=0.00..28.00 rows=1000 width=12)
         Selected Partitions:  1
(4 rows)

-- Query the specified partition p_3.
gaussdb=# EXPLAIN SELECT * FROM test_default PARTITION(p_3);
                                    QUERY PLAN                                     
-----------------------------------------------------------------------------------
 Partition Iterator  (cost=0.00..28.00 rows=1000 width=12)
   Iterations: 1
   ->  Partitioned Seq Scan on test_default  (cost=0.00..28.00 rows=1000 width=12)
         Selected Partitions:  3
(4 rows)

-- Delete the partitioned table.
gaussdb=# DROP TABLE test_default;