Updated on 2024-06-03 GMT+08:00

ANALYZE | ANALYSE

Description

  • ANALYZE collects statistics about ordinary tables in a database and stores the results in the PG_STATISTIC and PG_STATISTIC_EXT system catalogs. After you run the ANALYZE command, you can query the collected statistics in the system catalogs or in the system views PG_STATS and PG_EXT_STATS. The execution plan generator uses these statistics to generate the most effective execution plan.
  • If no parameters are specified, ANALYZE analyzes each table and partitioned table in the database. You can also specify table_name, column_name, and partition_name to limit the analysis to a specified table, column, or partitioned table.
  • {ANALYZE|ANALYSE} VERIFY checks whether data files of ordinary tables in a database are damaged.
  • Statistics collected each time are stored in historical statistics tables (GS_STATISTIC_EXT_HISTORY, GS_STATISTIC_HISTORY, and GS_TABLESTATS_HISTORY). The number of historical tables and the retention period of statistics are specified by the GUC parameters stats_history_record_limit and stats_history_retention_time.

Precautions

  • Non-temporary tables cannot be analyzed in an anonymous block, transaction block, function, or stored procedure. Temporary tables can be analyzed during execution of stored procedures, but statistics cannot be rolled back.
  • If remote read is not involved, the remote read parameter does not take effect. If a key system catalog page is damaged, an error is reported and the detection stops.
  • If no parameter to be analyzed is specified, tables on which the current user has the corresponding permission are analyzed by default. With a table specified, ANALYZE processes the table only.
  • To perform ANALYZE operation to a table, you must be a table owner or a user granted the VACUUM permission on the table. By default, a system administrator has this permission. However, database owners are allowed to analyze all tables in their databases, except shared catalogs. (The restriction for shared catalogs means that a true database-wide analyze operation can only be executed by the system administrator). ANALYZE skips tables on which users do not have permissions.
  • ANALYZE does not collect columns for which comparison or equivalent operations cannot be performed, for example, columns of the cursor type.

Syntax

  • Collect statistics information about a table.
    { ANALYZE | ANALYSE } [ VERBOSE ]
        [ table_name [ ( column_name [, ...] ) ] ] [ WITH PARTITION_MODE ];
  • Collect partition statistics information about a partitioned table.
    { ANALYZE | ANALYSE } [ VERBOSE ]
        table_name [ ( column_name [, ...] ) ] PARTITION ( partition_name ) ;
    • If the keyword PARTITION is used, partition_name must be the name of a level-1 partition.
    • The distributed system does not support level-2 partitions.
  • Collect statistics about global secondary indexes.
    { ANALYZE | ANALYSE } GLOBAL INDEX index_name FOR TABLE table_name;
    • Before performing ANALYZE on the global secondary index, you need to perform ANALYZE on the base table to ensure the accuracy of global secondary indexes.
    • After performing the REINDEX/REBUILD operation on the global secondary indexes, you need to perform ANALYZE on the base table first before the global secondary indexes to ensure the accuracy of global secondary indexes.
    • To collect statistics from global secondary indexes, you must specify index_name and table_name, and index_name must be the name of a global secondary index.
    • The global secondary indexes include only the statistics (reltuple and relpages) in the pg_class system catalog.
  • Collect statistics about multiple columns manually.
    { ANALYZE | ANALYSE } [ VERBOSE ]
        table_name (( column_1_name, column_2_name [, ...] ));
    • If the GUC parameter enable_functional_dependency is disabled, each group of multi-column statistics supports a maximum of 32 columns.
    • You are not allowed to collect statistics about multiple columns in system catalogs.
  • Collect statistics about multiple columns automatically.

    After the auto_statistic_ext_columns parameter is enabled and the ANALYZE statement is executed, multi-column statistics are automatically created based on the index prefix of the table. The number of columns in the multi-column statistics cannot exceed the value of auto_statistic_ext_columns.

    For example, if index (a, b, c, d) exists in table t and auto_statistic_ext_columns is set to 4, multi-column statistics about (a, b), (a, b, c), and (a, b, c, d) are created after table t is analyzed.

    { ANALYZE | ANALYSE } [ VERBOSE ] table_name;
  • Check the data files in the current database.
    { ANALYZE | ANALYSE } VERIFY { FAST | COMPLETE };
    • In fast mode, DML operations need to be performed on the tables to be verified concurrently. As a result, an error is reported during the verification. In the current fast mode, data is directly read from the disk. When other threads modify files concurrently, the obtained data is incorrect. Therefore, you are advised to perform the verification offline.
    • You can perform operations on the entire database. Because a large number of tables are involved, you are advised to save the results in redirection mode.
      gsql -d database -p port -f sqlfile> sqllog.txt  2>&1
    • Temporary tables and unlogged tables are not supported.
    • NOTICE is displayed only for externally visible tables. The detection of internal tables is included in the external tables on which the internal tables depend and is not displayed externally.
    • This statement can be executed with error tolerance.
    • If a key system catalog is damaged during a full database operation, an error is reported and the operation stops.
  • Check data files of tables and indexes.
    { ANALYZE | ANALYSE } VERIFY { FAST  | COMPLETE } { table_name | index_name } [ CASCADE ];
    • Operations on ordinary tables and index tables are supported, but CASCADE operations on indexes of index tables are not supported. The CASCADE mode is used to process all index tables of the main table. When the index tables are checked separately, the CASCADE mode is not required.
    • Temporary tables and unlogged tables are not supported.
    • When the main table is checked, the internal tables of the main table, such as the TOAST table, are also checked.
    • When the system displays a message indicating that the index table is damaged, you are advised to run the reindex command to rebuild the index.
  • Check the data files of the table partition.
{ ANALYZE | ANALYSE } VERIFY { FAST | COMPLETE } table_name PARTITION (patrition_name) [ CASCADE ];
  • You can check a single partition of a table, but cannot perform the CASCADE operation on the indexes of an index table.
  • Temporary tables and unlogged tables are not supported.

Parameters

  • VERBOSE

    Enables the display of progress messages.

    If VERBOSE is specified, ANALYZE displays the progress information, indicating the table that is being processed. Statistics about tables are also displayed.

  • table_name

    Specifies the name (possibly schema-qualified) of a specific table to analyze. If omitted, all regular tables (but not foreign tables) in the current database are analyzed.

    Currently, you can use ANALYZE to collect statistics on foreign tables of row-store tables.

    Value range: an existing table name.

  • column_name, column_1_name, column_2_name

    Specifies the name of a specific column to analyze. All columns are analyzed by default.

    Value range: An existing column name.

  • partition_name

    Assumes the table is a partitioned table. You can specify partition_name following the keyword PARTITION to analyze the statistics of this table.

    Value range: a partition name of a table.

  • index_name

    Specifies the name of the specific index table to be analyzed (possibly schema-qualified).

    Value range: an existing table name.

  • FAST|COMPLETE

    The FAST mode verifies the CRC and page header of the table. If the verification fails, an alarm is generated. In COMPLETE mode, the pointer and tuple of the table are parsed and verified.

  • CASCADE

    In CASCADE mode, all indexes of the current table are verified.

  • PARTITION_MODE

    Cascadingly collects statistics about partitioned tables. The following table describes the options and their meanings. This is not applicable to non-partitioned tables.

    Table 1 PARTITION_MODE options

    PARTITION_MODE Option

    Description

    ALL

    Statistics about the entire table and level-1 partition are collected.

    GLOBAL

    Statistics about the entire table are collected.

    PARTITION

    Statistics about the level-1 partition are collected.

    GLOBAL AND PARTITION

    Statistics about the entire table and level-1 partition are collected.

    ALL COMPLETE

    Statistics about the entire table and level-1 partition are collected.

    AUTO

    Default value. The parameter configured in statistic_granularity or default_statistic_granularity prevails. The priority of the table-level parameter statistic_granularity is higher than that of the global parameter default_statistic_granularity.

    A difference between ALL and ALL COMPLETE is as follows: ALL_COMPLETE uses a higher sampling rate and takes a longer time to calculate statistics.

    • When a version earlier than 505.0.0 is upgraded to 505.0.0 or later, PARTITION_MODE does not take effect during the upgrade observation period. The behavior of PARTITION_MODE is the same as that of the source version.
    • When the entire database is analyzed, the partition mode cannot be specified. That is, syntax such as ANALYZE WITH GLOBAL is not supported.
    • When the entire database is analyzed, default_statistic_granularity can be set to GLOBAL or ALL. In this case, the ANALYZE behavior is the same as that defined in default_statistic_granularity. When default_statistic_granularity is set to other values, the ANALYZE behavior is degraded to the GLOBAL behavior.

Examples

  • Collect statistics information about a table.
    -- Create the customer_info table.
    gaussdb=# CREATE TABLE customer_info(
    wr_returned_date_sk INTEGER ,
    wr_returned_time_sk INTEGER ,
    wr_item_sk INTEGER NOT NULL
    ,wr_refunded_customer_sk INTEGER ) DISTRIBUTE BY HASH (wr_item_sk);
    
    -- Run ANALYZE to update statistics.
    gaussdb=# ANALYZE customer_info;
    
    -- Run ANALYZE VERBOSE to update statistics and display information about the customer_info table.
    gaussdb=# ANALYZE VERBOSE customer_info;
    INFO: analyzing "public.customer_info"(cn_5002 pid=53078)
    
    -- Run ANALYZE VERBOSE to export the wr_returned_time_sk column information in the customer_info table.
    gaussdb=# ANALYZE VERBOSE customer_info(wr_returned_time_sk);
    INFO:  analyzing "public.customer_info"(cn_5002 pid=53078)
  • Collect statistics about a partitioned table.
    -- Create a partitioned table.
    gaussdb=# CREATE TABLE customer_par( 
    wr_returned_date_sk INTEGER ,
    wr_returned_time_sk INTEGER ,
    wr_item_sk INTEGER NOT NULL,
    wr_returned_customer_sk INTEGER)
    DISTRIBUTE BY HASH (wr_item_sk)PARTITION BY RANGE(wr_returned_date_sk)(PARTITION P1 VALUES LESS THAN(2452275),PARTITION P2 VALUES LESS THAN(2452640),PARTITION P3 VALUES LESS THAN(2453000),PARTITION P4 VALUES LESS THAN(MAXVALUE))ENABLE ROW MOVEMENT;
    
    -- Run ANALYZE to update statistics.
    gaussdb=# ANALYZE customer_par;
    
    -- Run ANALYZE VERBOSE to export the information in the customer_par table.
    gaussdb=# ANALYZE VERBOSE customer_par;
    INFO:  analyzing "public.customer_par"(cn_5001 pid=446617)
    
    -- Run ANALYZE VERBOSE to output information about the P1 level-1 partitioned table.
    gaussdb=# ANALYZE VERBOSE customer_par PARTITION(P1);
    INFO:  analyzing "public.customer_par"(cn_5001 pid=446617)
    
  • Collect statistics about multiple columns manually.
    -- Manually collect statistics in the wr_returned_date_sk and wr_returned_time_sk columns.
    gaussdb=# ANALYZE VERBOSE customer_info (wr_returned_date_sk,wr_returned_time_sk);
    INFO:  analyzing "public.customer_par"(cn_5001 pid=446617)
  • Collect statistics about multiple columns automatically.
    -- Create an index for the customer_info table.
    gaussdb=# CREATE INDEX customer_index ON customer_info USING btree(wr_returned_date_sk,wr_returned_time_sk,wr_item_sk,wr_refunded_customer_sk);
    
    -- Set auto_statistic_ext_columns to 4.
    gaussdb=# set auto_statistic_ext_columns=4;
    
    -- Collect statistics about multiple columns automatically.
    gaussdb=# ANALYZE VERBOSE customer_info;
    INFO:  analyzing "public.customer_info"(cn_5001 pid=446617)
  • Check the data files in the current database.
    gaussdb=# ANALYZE VERIFY FAST;
  • Check data files of tables and indexes.
    -- Check the customer_info table.
    gaussdb=# ANALYZE VERIFY FAST customer_info;
    
    -- Check the customer_index index.
    gaussdb=# ANALYZE VERIFY FAST customer_index;
    
  • Check the data files of the table partition.
    -- Check the P1 partition in the customer_par partitioned table.
    gaussdb=# ANALYZE VERIFY FAST customer_par PARTITION (P1);
  • Delete data.
    -- Delete the index customer_index.
    gaussdb=# DROP INDEX customer_index;
    
    -- Drop the customer_info table.
    gaussdb=# DROP TABLE customer_info;
    
    -- Drop the customer_par partitioned table.
    gaussdb=# DROP TABLE customer_par;