Updating Statistics
In a database, statistics serve as the raw data for the planner to generate execution plans. Failure to collect statistics or having outdated statistics can lead to severe degradation of execution plans, resulting in performance issues.
Context
The ANALYZE statement gathers statistical information related to the content of tables within the database, with the results stored in the system catalog PG_STATISTIC. The query optimizer uses these statistics to produce the most efficient execution plan.
You are advised to routinely execute the ANALYZE statement on tables or the entire database following batch insert/delete operations to update statistics. Currently, the default sampling rate for collecting statistics is 30,000 rows (that is, the GUC parameter default_statistics_target is set to 100 by default). If the total row count of a table exceeds a certain threshold (greater than 1,600,000), you are advised to set default_statistics_target to -2, meaning statistics are estimated based on a 2% sample.
Intermediate tables generated using batch processing scripts or stored procedures should also explicitly call ANALYZE upon completion of data generation.
For scenarios where multiple columns in a table exhibit correlation and queries involve simultaneous conditional or group operations based on these columns, consider gathering multi-column statistics. This enables the query optimizer to more accurately estimate row counts and generate more effective execution plans.
Generating Statistics
1
|
ANALYZE tablename; |

When examining the execution plan of various SQL statements using EXPLAIN, encountering rows=10 in the SEQ SCAN output of a table suggests the default system-assigned value, potentially indicating the absence of ANALYZE execution on that table. Thus, you need to execute ANALYZE on the table.
Enhancing Statistical Quality
ANALYZE uses a random sampling algorithm to extract samples from a table, computing table data characteristics based on these samples. Sample size can be controlled through the configuration parameter default_statistics_target, which ranges from -100 to 10000, with a default value of 100.
- When default_statistics_target > 0: The sample size equals 300 x default_statistics_target. Larger values result in bigger samples, increased memory usage, and prolonged computation times for statistics.
- When default_statistics_target < 0: The sample size corresponds to default_statistics_target/100 x Total table rows. Smaller values yield larger samples. When default_statistics_target < 0, sampled data is written to disks, eliminating memory concerns but introducing extended computation durations due to substantial sample sizes.
Negative default_statistics_target implies actual sampling equates to default_statistics_target/100 x Total table rows, hence termed percentage sampling.
Automatic Collection of Statistics
With the autoanalyze parameter enabled, when the optimizer detects missing statistics or significant data alterations exceeding thresholds during query optimization, automatic statistic collection is triggered to fulfill optimizer demands.
Within the cost-based optimizer (CBO) framework, statistics determine the efficacy of query plan generation, underscoring their timeliness and accuracy's paramount importance.
- Table-level statistics reside in relpages and reltuples of pg_class.
- Column-level statistics are housed in pg_statistics, accessible through the pg_stats view, including NULL value ratios, distinct value proportions, frequent values (MCV), histograms, and more.
Collection criteria: Significant data shifts—typically a 10% variation—prompt re-collection of statistics under the assumption that data characteristics have altered.
Accuracy
Category |
Description |
---|---|
Sampling size |
Configurable to adapt according to table size, governed by the default_statistics_target parameter. |
Sampling randomness |
|
Statistical estimation |
The enable_extrapolation_stats parameter allows for automatic calculation of more accurate statistics based on old statistics during distortion estimation. |
Viewing Statistics
View statistics on LakeFormation.
1
|
desc extended relation; |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot