Updated on 2025-08-25 GMT+08:00

Updating Statistics

In a database, statistics serve as the raw data for the planner to generate execution plans. Failure to collect statistics or having outdated statistics can lead to severe degradation of execution plans, resulting in performance issues.

Context

The ANALYZE statement gathers statistical information related to the content of tables within the database, with the results stored in the system catalog PG_STATISTIC. The query optimizer uses these statistics to produce the most efficient execution plan.

You are advised to routinely execute the ANALYZE statement on tables or the entire database following batch insert/delete operations to update statistics. Currently, the default sampling rate for collecting statistics is 30,000 rows (that is, the GUC parameter default_statistics_target is set to 100 by default). If the total row count of a table exceeds a certain threshold (greater than 1,600,000), you are advised to set default_statistics_target to -2, meaning statistics are estimated based on a 2% sample.

Intermediate tables generated using batch processing scripts or stored procedures should also explicitly call ANALYZE upon completion of data generation.

For scenarios where multiple columns in a table exhibit correlation and queries involve simultaneous conditional or group operations based on these columns, consider gathering multi-column statistics. This enables the query optimizer to more accurately estimate row counts and generate more effective execution plans.

Generating Statistics

Update statistics on a single table.
1
ANALYZE tablename; 

When examining the execution plan of various SQL statements using EXPLAIN, encountering rows=10 in the SEQ SCAN output of a table suggests the default system-assigned value, potentially indicating the absence of ANALYZE execution on that table. Thus, you need to execute ANALYZE on the table.

Enhancing Statistical Quality

ANALYZE uses a random sampling algorithm to extract samples from a table, computing table data characteristics based on these samples. Sample size can be controlled through the configuration parameter default_statistics_target, which ranges from -100 to 10000, with a default value of 100.

  • When default_statistics_target > 0: The sample size equals 300 x default_statistics_target. Larger values result in bigger samples, increased memory usage, and prolonged computation times for statistics.
  • When default_statistics_target < 0: The sample size corresponds to default_statistics_target/100 x Total table rows. Smaller values yield larger samples. When default_statistics_target < 0, sampled data is written to disks, eliminating memory concerns but introducing extended computation durations due to substantial sample sizes.

    Negative default_statistics_target implies actual sampling equates to default_statistics_target/100 x Total table rows, hence termed percentage sampling.

Automatic Collection of Statistics

With the autoanalyze parameter enabled, when the optimizer detects missing statistics or significant data alterations exceeding thresholds during query optimization, automatic statistic collection is triggered to fulfill optimizer demands.

Within the cost-based optimizer (CBO) framework, statistics determine the efficacy of query plan generation, underscoring their timeliness and accuracy's paramount importance.

  • Table-level statistics reside in relpages and reltuples of pg_class.
  • Column-level statistics are housed in pg_statistics, accessible through the pg_stats view, including NULL value ratios, distinct value proportions, frequent values (MCV), histograms, and more.

Collection criteria: Significant data shifts—typically a 10% variation—prompt re-collection of statistics under the assumption that data characteristics have altered.

Accuracy

Table 1 Accuracy

Category

Description

Sampling size

Configurable to adapt according to table size, governed by the default_statistics_target parameter.

Sampling randomness

  • The analyze_sample_mode parameter optimizes reservoir and range sampling randomness.
  • Enhance randomness of random number calculation with the random_function_version parameter.

Statistical estimation

The enable_extrapolation_stats parameter allows for automatic calculation of more accurate statistics based on old statistics during distortion estimation.

Viewing Statistics

View statistics on LakeFormation.

1
desc extended relation;