Updated on 2024-05-07 GMT+08:00

ANALYZE | ANALYSE

Description

ANALYZE collects statistics about ordinary tables in a database and stores the results in the PG_STATISTIC and PG_STATISTIC_EXT system catalogs. After you run the ANALYZE command, you can query the collected statistics in the preceding system catalogs or system views PG_STATS and PG_EXT_STATS. The execution plan generator uses these statistics to generate the most effective execution plan.

If no parameters are specified, ANALYZE analyzes each table and partitioned table in the database. You can also specify the table_name, column, and partition_name parameters to restrict the analysis to a specific table, column, or partitioned table.

ANALYZE | ANALYSE VERIFY checks whether data files of ordinary tables in a database are damaged.

Precautions

Non-temporary tables cannot be analyzed in an anonymous block, transaction block, function, or stored procedure. Temporary tables in a stored procedure can be analyzed but their statistics updates cannot be rolled back.

Remote read is not involved in the ANALYZE VERIFY scenario. Therefore, the remote read parameter does not take effect. If the system detects that a page is damaged due to an error in a key system table, the system directly reports an error and does not continue the detection.

With no table specified, ANALYZE processes all the tables that the current user has permission to analyze in the current database. With a table specified, ANALYZE processes only that table.

To perform ANALYZE operation to a table, you must be a table owner or a user granted the VACUUM permission on the table. By default, the system administrator has this permission. However, database owners are allowed to analyze all tables in their databases, except shared catalogs. (The restriction for shared catalogs means that a true database-wide analyze operation can only be executed by the system administrator). ANALYZE skips tables on which users do not have permissions.

ANALYZE does not collect columns for which comparison or equivalent operations cannot be performed, for example, columns of the cursor type.

Syntax

  • Collect statistics information about a table.
    { ANALYZE | ANALYSE } [ VERBOSE ]
        [ table_name [ ( column_name [, ...] ) ] ];
  • Collect partition statistics on a partitioned table. This syntax is not supported currently.
    { ANALYZE | ANALYSE } [ VERBOSE ]
        table_name [ ( column_name [, ...] ) ] PARTITION ( patrition_name ) ;

    An ordinary partitioned table supports the syntax but not the function of collecting statistics about specified partitions.

  • Collect statistics about global secondary indexes.
    { ANALYZE | ANALYSE }GLOBAL INDEX index_name FOR TABLE table_name;
    • Before performing ANALYZE on the global secondary index, you need to perform ANALYZE on the base table to ensure the accuracy of global secondary indexes.
    • After performing the REINDEX/REBUILD operation on the global secondary indexes, you need to perform ANALYZE on the base table first before the global secondary indexes to ensure the accuracy of global secondary indexes.
    • To collect statistics from global secondary indexes, you must specify index_name and table_name, and index_name must be the name of a global secondary index.
    • The global secondary indexes include only the statistics (reltuple and relpages) in the pg_class system catalog.
  • Collect statistics about multiple columns manually.
    { ANALYZE | ANALYSE } [ VERBOSE ]
        table_name (( column_1_name, column_2_name [, ...] ));
    • The statistics about a maximum of 32 columns can be collected at a time.
    • You are not allowed to collect statistics about multiple columns in system catalogs.
  • Collect statistics about multiple columns automatically.

    After the auto_statistic_ext_columns parameter is enabled and analyzed, multi-column statistics are automatically created based on the index prefix of the table. The number of columns in the multi-column statistics cannot exceed the value of auto_statistic_ext_columns.

    For example, if index (a, b, c, d) exists in table t and auto_statistic_ext_columns is set to 4, multi-column statistics about (a, b), (a, b, c), and (a, b, c, d) are created after table t is analyzed.

    { ANALYZE | ANALYSE } [ VERBOSE ] table_name;
  • Check the data files in the current database.
    { ANALYZE | ANALYSE } VERIFY { FAST | COMPLETE };
    • In fast mode, DML operations need to be performed on the tables to be verified concurrently. As a result, an error is reported during the verification. In the current fast mode, data is directly read from the disk. When other threads modify files concurrently, the obtained data is incorrect. Therefore, you are advised to perform the verification offline.
    • You can perform operations on the entire database. Because a large number of tables are involved, you are advised to save the results in redirection mode.
      gsql -d database -p port -f sqlfile> sqllog.txt  2>&1
    • Temporary tables and unlogged tables are not supported.
    • NOTICE is used to check only tables that are visible to external systems. The detection of internal tables is included in the external tables on which NOTICE depends and is not displayed externally.
    • This statement can be executed with error tolerance.
    • If a key system table is damaged during a full database operation, an error is reported and the operation stops.
  • Check data files of tables and indexes.
    { ANALYZE | ANALYSE } VERIFY { FAST  |COMPLETE } { table_name | index_name } [ CASCADE ];
    • Operations on ordinary tables and index tables are supported, but CASCADE operations on indexes of index tables are not supported. The CASCADE mode is used to process all index tables of the main table. When the index tables are checked separately, the CASCADE mode is not required.
    • Temporary tables and unlogged tables are not supported.
    • When the main table is checked, the internal tables of the main table, such as the toast table, are also checked.
    • When the system displays a message indicating that the index table is damaged, you are advised to run the reindex command to re-create the index.
  • Check the data files of the table partition.
{ ANALYZE | ANALYSE } VERIFY { FAST | COMPLETE } table_name PARTITION (patrition_name) [ CASCADE ];
  • You can check a single partition of a table, but cannot perform the CASCADE operation on the indexes of an index table.
  • Temporary tables and unlogged tables are not supported.

Parameters

  • VERBOSE

    Enables the display of progress messages.

    If VERBOSE is specified, ANALYZE displays the progress information, indicating the table that is being processed. Statistics about tables are also displayed.

  • table_name

    Specifies the name (possibly schema-qualified) of a specific table to analyze. If omitted, all regular tables (but not foreign tables) in the current database are analyzed.

    Currently, you can use ANALYZE to collect statistics on foreign tables of column-store tables.

    Value range: an existing table name.

  • column_name, column_1_name, column_2_name

    Specifies the name of a specific column to analyze. All columns are analyzed by default.

    Value range: an existing column name

  • partition_name

    Assumes the table is a partitioned table. You can specify partition_name following the keyword PARTITION to analyze the statistics of this table. Currently, ANALYZE can be performed on partitioned tables, but statistics of specified partitions cannot be analyzed.

    Value range: a partition name of a table

  • index_name

    Specifies the name of the specific index table to be analyzed (possibly schema-qualified).

    Value range: an existing table name

  • FAST|COMPLETE

    The FAST mode verifies the CRC and page header of the table. If the verification fails, an alarm is generated. In COMPLETE mode, the pointer and tuple of the table are parsed and verified.

  • CASCADE

    In CASCADE mode, all indexes of the current table are verified.

Examples

1
2
3
4
5
6
7
8
9
-- Create a table.
gaussdb=# CREATE TABLE customer_info
(
WR_RETURNED_DATE_SK       INTEGER                       ,
WR_RETURNED_TIME_SK       INTEGER                       ,
WR_ITEM_SK                INTEGER               NOT NULL,
WR_REFUNDED_CUSTOMER_SK   INTEGER
)
DISTRIBUTE BY HASH (WR_ITEM_SK);

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
-- Create a partitioned table.
gaussdb=# CREATE TABLE customer_par
(
WR_RETURNED_DATE_SK       INTEGER                       ,
WR_RETURNED_TIME_SK       INTEGER                       ,
WR_ITEM_SK                INTEGER               NOT NULL,
WR_REFUNDED_CUSTOMER_SK   INTEGER
)
DISTRIBUTE BY HASH (WR_ITEM_SK)
PARTITION BY RANGE(WR_RETURNED_DATE_SK)
(
PARTITION P1 VALUES LESS THAN(2452275),
PARTITION P2 VALUES LESS THAN(2452640),
PARTITION P3 VALUES LESS THAN(2453000),
PARTITION P4 VALUES LESS THAN(MAXVALUE)
)
ENABLE ROW MOVEMENT;

1
2
-- Run the ANALYZE statement to update statistics.
gaussdb=# ANALYZE customer_info;

1
2
3
4
5
-- Run the ANALYZE VERBOSE statement to update statistics and display table information.
gaussdb=# ANALYZE VERBOSE customer_info;
INFO:  analyzing "public.customer_info"(cn_5002 pid=53078)
INFO:  analyzing "public.customer_info" inheritance tree(cn_5002 pid=53078)
ANALYZE

If any environment-related fault occurs, check the logs of CN.

1
2
3
-- Delete the tables.
gaussdb=# DROP TABLE customer_info;
gaussdb=# DROP TABLE customer_par;