Updated on 2024-03-08 GMT+08:00

VACUUM FULL Is Slow

Common scenarios and troubleshooting methods for slow execution of VACUUM FULL are as follows:

Scenario 1: VACUUM FULL Is Executed Slowly Due to Lock Wait

  • If the cluster version is 8.1.x or later, perform the following steps:
  1. Query the pgxc_lock_conflicts view to check lock conflicts.

    1
    SELECT * FROM pgxc_lock_conflicts;
    

    • As shown in the following figure, if the value of granted is f, the VACUUM FULL statement is waiting for another lock. If granted is t, the INSERT statement holds the lock. nodename indicates the CN or DN where the lock is generated, for example, cn_5001. Execute 2.
    • If 0 rows is displayed in the command output, no lock conflict occurs. In this case, check other scenarios.

  2. Based on the statement content, determine whether to run VACUUM FULL immediately after terminating the lock-holding statement or run VACUUM FULL during off-peak hours.

    To terminate a lock-holding statement, run the following statement: In the preceding command, pid is obtained from step 1, and cn_5001 is the node name queried out.

    1
    execute direct on (cn_5001) 'SELECT PG_TERMINATE_BACKEND(pid)';
    

  3. After the statement is terminated, run VACUUM FULL again.

    1
    VACUUM FULL table_name;
    

  • For 8.0.x and earlier versions:
  1. Run the following statement in the database to obtain the value of query_id corresponding to the VACUUM FULL operation:

    1
    SELECT * FROM pgxc_stat_activity WHERE query LIKE '%vacuum%'AND waiting = 't'; 
    

  2. Run the following statement to check whether lock wait exists (Use the obtained query_id):

    1
    SELECT * FROM pgxc_thread_wait_status WHERE query_id = {query_id};
    

    • If acquire lock is displayed in wait_status in the command output, lock wait exists. Check the value of node_name and record it, for example, cn_5001 or dn_600x_600y, then go to 3.
    • If in the query result, wait_status does not contain acquire lock, there is no lock wait. Check other scenarios.

  3. Run the following statement to check the lock waited by the VACUUM FULL operation in pg_locks: The following uses cn_5001 as an example. If the lock wait is on a DN, change it to the corresponding DN name. pid is obtained in 2.

    Record the value of relation in the command output.

    1
    execute direct on (cn_5001) 'SELECT * FROM pg_locks WHERE pid = {tid} AND granted = ''f''';
    

  4. View in pg_locks the PID of the transaction holding the lock based on the value of relation . The value of relation is obtained in 3.

    1
    execute direct on (cn_5001) 'SELECT * FROM pg_locks WHERE relation = {relation} AND granted = ''t'''; 
    

  5. Run the following statement to query the corresponding SQL statement based on the PID: The value of pid is obtained in 4.

    1
    execute direct on (cn_5001) 'SELECT query FROM pg_stat_activity WHERE pid ={pid}';
    

  6. Based on the statement content, determine whether to run VACUUM FULL immediately after terminating the lock-holding statement or run VACUUM FULL during off-peak hours.

    To terminate a lock-holding statement, run the following statement: In the preceding command, pid is obtained from step 1, and cn_5001 is the node name queried in 4.

    1
    execute direct on (cn_5001) 'SELECT PG_TERMINATE_BACKEND(pid)';
    

  7. After the statement is terminated, run the VACUUM FULL statement.

    1
    VACUUM FULL table_name;
    

Scenario 2: Transactions Cannot Be Committed Due to I/O or Network Problems

Solution: Run a simple CRETAE TABLE statement. If the execution of the CRETAE TABLE statement is also slow, the I/O or network may be faulty. In this case, check the I/O and network conditions.

Scenario 3: VACUUM FULL Is Executed Slowly Due to Large System Catalogs

After fixing the I/O or network problems, run VACUUM FULL on empty tables. If VACUUM FULL is executed slowly even on empty tables, the system catalogs are too large. Executing VACUUM FULL on any table requires scanning the system catalogs pg_class, pg_partition, and pg_proc. If the three system catalogs are too large, VACUUM FULL will be executed slowly.

Solution: GaussDB(DWS) supports executing VACUUM FULL on system catalogs. However, the execution holds an eight-level lock, and services related to system catalogs will be blocked. Clear system catalogs during off-peak hours or when services are stopped and no DDL operation is performed.

Scenario 4: Slow VACUUM FULL on a Column-store Table Using Partial Clustering (PCK)

When VACUUM FULL is performed on a column-store table, if PCK exists, all records in PARTIAL_CLUSTER_ROWS are loaded to the memory and then sorted. If the table is large or psort_work_mem is set to a small value, data will spill to disks during PCK sorting (the database stores temporary results to disks). This increases the time consumption.

Solution: Adjust the values of PARTIAL_CLUSTER_ROWS and psort_work_mem based on the tuple length of the data in the table.
  1. Run the following statement to view the table definition. If PARTIAL CLUSTER KEY is displayed in the command output, the table contains PCKs.
    1
    SELECT * FROM pg_get_tabledef('table name');
    

  2. Log in to the GaussDB(DWS) management console and choose Clusters on the left.
  3. Click the cluster name to go to the cluster details page.
  4. Click Parameters on the left, enter psort_work_mem in the search box, increase the values on both CNs and DNs, and click Save.
  5. Run VACUUM FULL again.