Updated on 2024-12-18 GMT+08:00

Automatic Cleanup

The automatic cleanup process (autovacuum) in the system automatically runs the VACUUM and ANALYZE statements to reclaim the record space marked as deleted and update statistics about the table.

autovacuum does not block service statements initiated by users. autovacuum and autoanalyze statements can be executed concurrently without conflicts. This function is supported only in versions later than 8.2.1.300.

autovacuum

Parameter description: Specifies whether to start the automatic cleanup process (autovacuum). Ensure that the track_counts parameter is set to on before enabling the automatic cleanup process.

For clusters of version 8.1.3 or later, automatic cleanup can be configured on the GaussDB(DWS) management console. For details, see Intelligent O&M Overview. For clusters of version 8.1.2 or earlier, configure the feature by referring to Configuring GUC Parameters.

Type: SIGHUP

Value range: Boolean

  • on indicates the database automatic cleanup process is enabled.
  • off indicates that the database automatic cleanup process is disabled.

Default value: on

Set autovacuum to on if you want to enable the function of automatically cleaning up two-phase transactions after the system recovers from faults.
  • If autovacuum is set to on and autovacuum_max_workers to 0, the autovacuum process will not be automatically performed and only abnormal two-phase transactions are cleaned up after the system recovers from faults.
  • If autovacuum is set to on and the value of autovacuum_max_workers is greater than 0, the system will automatically clean up two-phase transactions and processes after recovering from faults.
Even if this parameter is set to off, the database initiates a cleanup process when transaction ID wraparound needs to be prevented. When a CREATE DATABASE or DROP DATABASE operation fails, the transaction may have been committed or rolled back on some nodes whereas some nodes are still in the prepared state. In this case, perform the following operations to manually restore the nodes:
  1. Use the gs_clean tool (setting the option parameter to -N) to query the xid of the abnormal two-phase transaction and nodes in the prepared status.
  2. Log in to the nodes whose transactions are in the prepared status. Administrators connect to an available database such as gaussdb to run the SET xc_maintenance_mode = on statement.
  3. Commit or roll back the two-phase transaction based on the global transaction status.
  4. If autovacuum is set to off for a long time in the cluster and you want to change the value to on, you must perform VACUUM FULL on the key system catalogs of the cluster. These key system catalogs include pg_class, pg_attribute, pg_index, pg_type, pg_statistic, pg_statistic_ext, pg_proc, pg_partition, pg_constraint, pg_inherits, pg_rewrite, pg_description, pg_depend, pg_shdepend, pg_shdescription, pgxc_class, pg_jobs, pg_redaction_policy, pg_redaction_column, pg_object, and pg_matview.

autovacuum_mode

Parameter description: Specifies whether the autoanalyze or autovacuum function is enabled. This parameter is valid only when autovacuum is set to on.

Type: SIGHUP

Value range: enumerated values

  • analyze indicates that only autoanalyze is performed.
  • vacuum indicates that only autovacuum is performed.
  • mix indicates that both autoanalyze and autovacuum are performed.
  • none indicates that neither of them is performed.

Default value: mix

autoanalyze_mode

Parameter description: Specifies the autoanalyze mode. This parameter is supported by clusters of version 8.2.0 or later.

Type: USERSET

Value range: enumerated values

  • normal indicates common autoanalyze.
  • light indicates lightweight autoanalyze.

Default value:

  • If the current cluster is upgraded from an earlier version to 8.2.0, the default value is normal to ensure forward compatibility.
  • If the cluster version 8.2.0 is newly installed, the default value is light.

autoanalyze_cache_num

Parameter description: Specifies the maximum number of tables whose statistics can be cached by lightweight autoanalyze. If the number of tables exceeds this value, the statistics about the earliest 100 tables will be deleted. This feature is supported only in 8.2.0 or later.

Type: SIGHUP

Value range: an integer ranging from 100 to INT_MAX

Default value: 10000

autoanalyze_timeout

Parameter description: Specifies the timeout period of autoanalyze. If the duration of analyze on a table exceeds the value of autoanalyze_timeout, analyze is automatically canceled.

Type: SIGHUP

Value range: an integer ranging from 0 to 2147483. The unit is second.

Default value: 5min

analyze_stats_mode

Parameter description: Specifies the mode for ANALYZE to calculate statistics.

Type: USERSET

Value range: enumerated values

  • memory indicates that the memory is forcibly used to calculate statistics. Multi-column statistics are not calculated.
  • sample_table indicates that temporary sampling tables are forcibly used to calculate statistics. Temporary tables do not support this mode.
  • dynamic indicates that the statistics calculation mode is determined based on the size of maintenance_work_mem. If maintenance_work_mem can store samples, the memory mode is used. Otherwise, the temporary sampling table mode is used.

Default value:

  • If the current cluster is upgraded from an earlier version to 8.2.0.100, the default value is memory to ensure forward compatibility.
  • If the cluster version 8.2.0.100 is newly installed, the default value is dynamic.

analyze_sample_mode

Parameter description: Specifies the sampling model used by ANALYZE.

Type: USERSET

Value range: an integer ranging from 0 to 2

  • 0 indicates the default reservoir sampling.
  • 1 indicates the optimized reservoir sampling.
  • 2 indicates range sampling.

Default value: 0

autovacuum_io_limits

Parameter description: Specifies the upper limit of I/Os triggered by the autovacuum process per second. This parameter has been discarded in version 8.1.2 and is reserved for compatibility with earlier versions. This parameter is invalid in the current version.

Type: SIGHUP

Value range: an integer ranging from –1 to 1073741823. –1 indicates that the default Cgroup is used.

Default value: –1

autovacuum_max_workers

Parameter description: Specifies the maximum number of automatic cleanup threads running at the same time.

Type: SIGHUP

Value range: an integer ranging from 0 to 128. 0 indicates that autovacuum is disabled.

Default value: 4

This parameter works with autovacuum. The rules for clearing system catalogs and user tables are as follows:

  • When autovacuum_max_workers is set to 0, autovacuum is disabled and no tables are cleared.
  • When autovacuum_max_workers is set to a value greater than 0 and autovacuum is set to off, the system only clears the system catalogs and column-store tables with delta tables enabled (such as vacuum delta tables, vacuum cudesc tables, and delta merge).
  • When autovacuum_max_workers is set to a value greater than 0 and autovacuum is set to on, all tables will be cleared.

autovacuum_naptime

Parameter description: Specifies the interval between two automatic cleanup operations.

Type: SIGHUP

Value range: an integer ranging from 1 to 2147483. The unit is second.

Default value: 60s

autovacuum_vacuum_threshold

Parameter description: Specifies the threshold for triggering the VACUUM operation. When the number of deleted or updated records in a table exceeds the specified threshold, the VACUUM operation is executed on this table.

Type: SIGHUP

Value range: an integer ranging from 0 to INT_MAX

Default value: 50

autovacuum_analyze_threshold

Parameter description: Specifies the threshold for triggering the ANALYZE operation. When the number of deleted, inserted, or updated records in a table exceeds the specified threshold, the ANALYZE operation is executed on this table.

Type: SIGHUP

Value range: an integer ranging from 0 to INT_MAX

Default value:

  • If the current cluster is upgraded from an earlier version to 8.1.3, the default value is 10000 to ensure forward compatibility.
  • If the current cluster version is 8.1.3, the default value is 50.

autovacuum_vacuum_scale_factor

Parameter description: Specifies the size scaling factor of a table added to the autovacuum_vacuum_threshold parameter when a VACUUM event is triggered.

Type: SIGHUP

Value range: a floating point number ranging from 0.0 to 100.0

Default value: 0.2

autovacuum_analyze_scale_factor

Parameter description: Specifies the size scaling factor of a table added to the autovacuum_analyze_threshold parameter when an ANALYZE event is triggered.

Type: SIGHUP

Value range: a floating point number ranging from 0.0 to 100.0

Default value:

  • If the current cluster is upgraded from an earlier version to 8.1.3, the default value is 0.25 to ensure forward compatibility.
  • If the current cluster version is 8.1.3, the default value is 0.1.

autovacuum_freeze_max_age

Parameter description: Specifies the maximum age (in transactions) that a table's pg_class.relfrozenxid column can attain before a VACUUM operation is forced to prevent transaction ID wraparound within the table.

The old files under the subdirectory of pg_clog/ can also be deleted by the VACUUM operation. Even if the automatic cleanup process is forbidden, the system will invoke the automatic cleanup process to prevent the cyclic repetition.

Type: SIGHUP

Value range: an integer ranging from 100000 to 576460752303423487

Default value: 4000000000

autovacuum_vacuum_cost_delay

Parameter description: Specifies the value of the cost delay used in the autovacuum operation.

Type: SIGHUP

Value range: an integer ranging from –1 to 100. The unit is ms. –1 indicates that the normal vacuum cost delay is used.

Default value: 2ms

autovacuum_vacuum_cost_limit

Parameter description: Specifies the value of the cost limit used in the autovacuum operation.

Type: SIGHUP

Value range: an integer ranging from –1 to 10000. –1 indicates that the normal vacuum cost limit is used.

Default value: –1

check_crossvw_write

Parameter description: Specifies whether to enable cross-VW write detection. This parameter is supported only by clusters of version 9.1.0.100 or later.

Type: USERSET

Value range: an integer, -1 or 1.

  • The value –1 indicates that it is compatible with the capabilities of version 9.0.3. For the v3 table vacuum, it only clears non-last files for all epochs.
  • The value 1 indicates checking whether it is a cross-VW write scenario. For the v3 table vacuum, if it is determined to be a non-cross-VW write scenario, it clears non-last files for all epochs, clears the last file for the current epoch, and clears the last file for epochs that are less than the current epoch. If it is determined to be a cross-VW write scenario, CNs will obtain epoch information from all DNs and package it into an epochList to be sent to the metadata VW. The v3 table vacuum will clear non-last files for all epochs and clear the last file for epochs that are less than max{epochList} and not in epochList.

Default value: 1

global_colvacuum_tuple_scale_factor

Parameter description: Specifies whether to enable global autovacuum for cross-VW file cleanup of V3 tables. This is supported only by clusters of version 9.1.0.200 or later.

Type: SIGHUP

Value range: an integer ranging from -1 to 100.

  • –1 indicates that global autovacuum is disabled. autovacuum only cleans up non-last files from all epochs.
  • 0 indicates that global autovacuum is enabled. The threshold for triggering global autovacuum is the product of the dead tuple threshold of the partitioned table and the number of nodes where the table is located.
  • 1-100 indicates that global autovacuum is enabled. The threshold for triggering global autovacuum is a multiple of the dead tuple threshold of the partitioned table.

Default value: 0

colvacuum_threshold_scale_factor

Parameter description: Specifies the minimum percentage of dead tuples for vacuum rewriting in column-store tables. When AUTOVACUUM detects that the total number of dead tuples in a column-store table is greater than RelDefaultFullCuSize(60000) and the ratio of this number to all_tuples is greater than 1/2, the VACUUM operation is started on the column-store table. A file is rewritten only when the ratio of dead tuples to (all_tuple - null_tuple) in the file is greater than the value of this parameter.

Type: SIGHUP

Value range: an integer ranging from –2 to 100.

  • –2 indicates that vacuum rewriting and vacuum cleanup are not performed.
  • –1 indicates to perform vacuum rewriting is not performed and only vacuum cleanup is performed.
  • The value ranges from 0 to 100, indicating the percentage of dead tuples.

Default value: 70

enable_pg_stat_object

Parameter description: Specifies whether AUTO VACUUM updates the PG_STAT_OBJECT system catalog. This parameter is supported only by clusters of version 8.2.1 or later.

Type: USERSET

Value range: Boolean

  • on indicates that the PG_STAT_OBJECT system catalog is updated during AUTO VACUUM.
  • off indicates that the PG_STAT_OBJECT system catalog is not updated during AUTO VACUUM.

Default value: on

enable_col_index_vacuum

Parameter description: Specifies whether to allow AUTO VACUUM to clear dirty data in column-store indexes. Clearing dirty data of column-store indexes can prevent index space expansion and optimize the performance of importing tables with indexes to the database. This parameter is supported only by clusters of version 8.2.1.100 or later.

Type: SIGHUP

Value range: Boolean

  • on indicates that AUTO VACUUM is allowed to clear dirty data of column-store indexes.
  • off indicates that AUTO VACUUM is not allowed to clear dirty data of column-store indexes.

Default value: on

By default, this parameter is set to on in a newly installed cluster and off after an old cluster is upgraded.

enable_table_level_oldestxmin

Parameter description: Specifies whether to enable table-level oldestxmin. This feature gives each table a separate oldestxmin. During VACUUM, the table ignores long transactions that do not involve the table. This allows the table to be cleaned faster and reuse space more efficiently. This parameter is supported only by clusters of version 8.3.0 or later.

Type: SIGHUP

Value range: Boolean

  • on indicates that table-level oldestxmin is enabled.
  • off indicates that table-level oldestxmin is disabled.

Default value: off

  • A long transaction refers to a transaction that has been running for a long period of time but has not been committed. For details, see old_txn_threshold.
  • Table-level oldestxmin does not take effect on system catalogs. System catalogs still use global oldestxmin, which means all long transactions are not ignored during VACUUM.

old_txn_threshold

Parameter description: When table-level oldestxmin is calculated, transactions that run longer than the value of this parameter are regarded as long transactions. This parameter is supported only by clusters of version 8.3.0 or later.

Type: SIGHUP

Value range: an integer ranging from 1 to 1000000. The unit is second.

Default value: 600

  • Calculation rules of table-level oldestxmin:
    • The transaction running duration is calculated based on the snapshot time.
    • Transactions that run for shorter than old_txn_threshold are not considered long transactions and affect how oldestxmin is computed for all tables.
    • Transactions that run for longer than old_txn_threshold are considered long transactions. When computing oldestxmin for a table, the system ignores transactions that do not affect the table, and counts transactions that affect the table as active.
  • You need to adjust old_txn_threshold during service running. If a transaction uses a snapshot for longer than old_txn_threshold, the system shows an error "Snapshot is invalid" when opening a table or partition that is not open yet. If this error is reported, increase the value of old_txn_threshold.