Updated on 2025-07-22 GMT+08:00

ANALYZE | ANALYSE

Function

This syntax gathers statistics on table content within a database. These statistics include both table-level and column-level details. Table-level data is kept in PG_CLASS, while column-level details are stored in PG_STATISTIC. The query planner relies on this information to choose the best execution strategy.

Its key purposes are:

  • Optimizing query performance: supplies data distribution insights to the query optimizer for generating efficient plans.
  • Enhancing plan accuracy: helps the optimizer better estimate the cost of various query paths.
  • Selecting indexes: assists the optimizer in deciding if and which indexes to use.

Precautions

  • If no parameters are specified, ANALYZE analyzes each table and partitioned table in the database. You can also specify table_name, column, and partition_name to limit the analysis to a specified table, column, or partitioned table.
  • Users who can execute ANALYZE on a specific table include the owner of the table, owner of the database where the table is, users with the ANALYZE permission on the table, users who are granted the gs_role_analyze_any role, and users with the SYSADMIN attribute.
  • To collect statistics using percentage sampling, you must have the ANALYZE and SELECT permissions.
  • If the enable_analyze_partition parameter is enabled and the table-level incremental_analyze parameter is set for a partitioned table, the ANALYZE statement is executed on partitions lacking statistics or with data changes. The statistics of the partition main table are then generated by combining the partition statistics.
  • Only cluster versions 8.1.1 and later support using anonymous blocks, transaction blocks, functions, or stored procedures to perform the ANALYZE operation on an unsharded table.
  • However, for analyzing an entire database, the ANALYZE operation of each table is in different transactions. Therefore, the current version does not support the ANALYZE execution for the entire database in anonymous blocks, transaction blocks, functions, or stored procedures.
  • Statistics updates of PG_CLASS related columns cannot be rolled back.
  • The ANALYZE statistics of global temporary tables in each session are independent. Each session has its own statistics.
  • ANALYZE and ANALYSE VERIFY are used to check whether data files of common tables (row-store and column-store tables) in a database are damaged. Currently, this function does not support HDFS tables.
  • Most ANALYZE VERIFY operations are used for abnormal scenario detection, and require a release version. Remote read is not triggered in the ANALYZE VERIFY scenario. Therefore, the remote read parameter does not take effect. If the system detects that the page is damaged due to an error in the key system table, the system reports an error and stops the detection.
  • If more than 10% of a table's records are added or modified at once, explicitly execute the ANALYZE operation.
  • For more information about development and design specifications, see Development and Design Proposal.

Syntax

  • Collect statistics information about a table.
    1
    2
    { ANALYZE | ANALYSE } [ { VERBOSE | LIGHT | FORCE | PREDICATE } ]
        [ table_name [ ( column_name [, ...] ) ] ];
    
  • Collect statistics about a partition.
    1
    2
    3
    { ANALYZE | ANALYSE } [ { VERBOSE | LIGHT | FORCE } ]
        [ table_name [ ( column_name [, ...] ) ] ]
        PARTITION ( partition_name ) ;
    
    • Ordinary partitioned tables support the syntax for collecting statistics on a specific partition, but not the actual function of it. ANALYZE on a specified partition will cause a warning message.
    • Temporary sampling tables cannot be used to collect partition statistics.
    • Multi-column statistics and expression statistics on partitions are not supported.
  • Collect statistics about a foreign table.
    1
    2
    { ANALYZE | ANALYSE } [ VERBOSE ]
        { foreign_table_name | FOREIGN TABLES };
    
  • Collect statistics about multiple columns.
    1
    2
    {ANALYZE | ANALYSE} [ VERBOSE ]
        table_name (( column_1_name, column_2_name [, ...] ));
    
    • To sample data in percentage, set default_statistics_target to a negative number.
    • The statistics about a maximum of 32 columns can be collected at a time.
    • You are not allowed to collect statistics about multiple columns in system catalogs, global temporary tables, or HDFS foreign tables.
  • Check the data files in the current database.
    1
    {ANALYZE | ANALYSE} VERIFY {FAST|COMPLETE};
    
    • All operations on the database are supported. Because many tables are involved, you are advised to save the result in redirection mode: gsql -d database -p port -f "verify.sql"> verify_warning.txt 2>&1.
    • HDFS tables (internal and foreign tables), temporary tables, and unlog tables are not supported.
    • Only the NOTICE message is displayed for external tables. The detection of internal tables is included in the external tables that depend on them, but it is not shown externally.
    • This command can be used to process tolerant errors. The assert operation in a debug version may cause the core to fail to execute commands. Therefore, you are advised to perform this operation in a release version.
    • If a key system table is damaged during a full database operation, an error is reported and the operation stops.
  • Check data files of tables and indexes.
    1
    {ANALYZE | ANALYSE} VERIFY {FAST|COMPLETE} table_name|index_name [CASCADE];
    
    • You can perform operations on common tables and index tables, but cannot perform CASCADE operations on index tables. The reason is that CASCADE is used to process all index tables of the primary table. When the index table is checked separately, CASCADE is not required.
    • HDFS tables (internal and foreign tables), temporary tables, and unlog tables are not supported.
    • When the primary table is checked, the internal tables of the primary table, such as the toast table and cudesc table, are also checked.
    • When the system displays a message indicating that the index table is damaged, you are advised to run the reindex command to recreate the index.
  • Check the data files of the table partition.
    1
    {ANALYZE | ANALYSE} VERIFY {FAST|COMPLETE} table_name PARTITION {(partition_name)}[CASCADE];
    
    • You can detect a single partition of a table, but cannot perform the CASCADE operation on index tables.
    • HDFS tables (internal and foreign tables), temporary tables, and unlog tables are not supported.

Parameter Description

Table 1 ANALYZE | ANALYSE parameters

Parameter

Description

Value Range

VERBOSE

After VERBOSE is enabled, ANALYZE sends progress information, indicating the table being processed. Statistics about the table are also printed.

-

LIGHT

In LIGHT mode, the system stores table statistics in memory instead of writing them to the system catalog. Executing this command locks the table at a certain level. Specify this option within parentheses, like ANALYZE (LIGHT) t1.

-

FORCE

In FORCE mode, you can override locks on table statistics and run ANALYZE. Add this option in parentheses like this: ANALYZE (FORCE) t1.

-

PREDICATE

PREDICATE indicates that only the statistics of the currently identified predicate column are calculated. This parameter is supported only by clusters of version 9.1.0.100 or later. You can set the analyze_predicate_column_threshold parameter to determine whether the ANALYZE operation on predicate columns is supported. A value of 0 indicates that the ANALYZE operation is disabled. A value greater than 0 indicates that the ANALYZE operation is enabled only for tables whose number of columns is greater than or equal to this value.

Use this option in parentheses, for example, ANALYZE (PREDICATE) xxx.

Predicate information is collected in the query parsing phase. Dynamic sampling also supports predicate column sampling.

-

table_name

Specifies the name (possibly schema-qualified) of a specific table to analyze. If omitted, all regular tables (but not foreign tables) in the current database are analyzed.

Currently, you can use ANALYZE to collect statistics from row-store tables, column-store tables, HDFS tables, ORC- and CARBONDATA-formatted OBS foreign tables, and foreign tables for collaborative analysis.

The table name can contain a maximum of 63 characters, including letters, digits, underscores (_), dollar signs ($), and number signs (#). It must start with a letter or underscore (_).

column_name

Specifies the name of a specific column to analyze. All columns are analyzed by default.

Existing column name (format: column_1_name, column_2_name, column_3_name....)

partition_name

Analyzes the statistics of the partitioned table. Currently the partitioned table supports the syntax of analyzing a partitioned table, but does not execute this syntax.

A string, which must comply with the naming convention of the partition name.

foreign_table_name

Specifies the name (possibly schema-qualified) of a specific table to analyze. The data of the table is stored in HDFS.

A string, which must comply with the naming convention of foreign tables.

FOREIGN TABLES

Analyzes HDFS foreign tables stored in HDFS and accessible to the current user.

A string, which must comply with the naming rules of HDFS foreign tables.

index_name

Name of the index table to be analyzed, which may contain the schema name.

A string, which must comply with the naming rules of index tables.

FAST|COMPLETE

Verifies row-store tables and column-store tables in different modes.

  • Row-store tables
    • In FAST mode, the CRC and page header of the row-store table are verified. If the verification fails, an alarm is generated.
    • In COMPLETE mode, the pointer and tuple of the row-store table are parsed and verified.
  • Column-store tables
    • In FAST mode, the CRC and magic of the column-store table are verified. If the verification fails, an alarm is generated.
    • In COMPLETE mode, CUs of column-store tables are parsed and verified.

CASCADE

In CASCADE mode, all indexes of the current table are checked.

-

Examples

  • Do ANALYZE to update statistics in the customer_info table:
    1
    ANALYZE customer_info;
    
  • Do ANALYZE VERBOSE to update statistics and display table information in the customer_info table:
    1
    2
    3
    4
    5
    ANALYZE VERBOSE customer_info;
    INFO:  analyzing "cstore.pg_delta_3394584009"(cn_5002 pid=53078)
    INFO:  analyzing "public.customer_info"(cn_5002 pid=53078)
    INFO:  analyzing "public.customer_info" inheritance tree(cn_5002 pid=53078)
    ANALYZE
    
  • Update the statistics of the name and email columns in the customer_info table.
    1
    ANALYZE customer_info (name, email);
    
  • Update the statistics of the customer_info table in lightweight mode.
    1
    ANALYZE (LIGHT) customer_info;
    
  • Update the statistics of the customer_info table in forcible mode, even if the table is locked.
    1
    ANALYZE (FORCE) customer_info;
    
  • Update only the statistics of predicate columns. Only the predicate columns collected so far are sampled. The statistics are used after the query service is stable.
    1
    ANALYZE (PREDICATE) customer_info;
    

    You can use the pg_stat_get_predicate_columns function to query the predicate columns of a table. The table name in the following example is customer_info. Replace it with the actual table name.

    1
    SELECT pr.attnum,pa.attname FROM pg_catalog.pg_stat_get_predicate_columns('customer_info'::regclass) pr LEFT JOIN pg_attribute pa ON pa.attrelid='customer_info'::regclass AND pa.attnum = pr.attnum;