更新时间:2025-05-29 GMT+08:00

如何评估VACUUM清理速率或需要清理多久

AUTOVACUUM的执行分为若干步骤,各个步骤的执行耗时各不相同,其流程简化如下:

图1 VACUUM流程图

从上图可以看出,VACUUM单表的耗时主要与以下几个因素相关:表页面数量、索引页面数量、全局索引个数、分区数量、IO管控时间。

单表VACUUM速度的估算公式如下:

  • 非分区表

    页面数量 = 数据页数量 + 索引页面数量

    清理时间 = 页面数量 * 清理耗时 + 页面数量 / 评估代价 * 休眠时间

    清理速度 = 死行总数 / 清理时间

  • 分区表

    页面数量= 数据页数量 + 普通索引页面数量 + (全局索引页面数量之和) * 分区数量

    清理时间 = 页面数量 * 清理耗时 + 页面数量 / 评估代价 * 休眠时间

    清理速度 = 死行总数 / 清理时间

可以通过以下SQL语句对单表的VACUUM速度进行评估:

  • 非分区表:
    select 
    ((heap_pages * prune_time) + (index_pages * prune_time) * nindexs + ((heap_pages+index_pages) * hit_percent * Vacuum_cost_page_hit + (heap_pages+index_pages) * miss_percent * Vacuum_cost_page_miss + (heap_pages+index_pages) * dirty_percent * Vacuum_cost_page_dirty) * Vacuum_cost_delay / Vacuum_cost_limit) as totaltime, deadtuples/totaltime as speed
    from 
    (select relpages as heap_pages,  b.n_dead_tup as deadtuples, b.n_dead_tup/(reltuples+b.n_dead_tup +1) as dirty_percent from pg_class a, pg_stat_all_tables b where a.relname = 't1' and b.relname = a.relname) as abc , (select sum(relpages) as index_pages, count(*) as nindexs from pg_class where relname in (select indexname from pg_indexes where tablename = 't1') and relkind = 'i') as bcd , (select 0.07 as prune_time, 0.05 as miss_percent , 0.95 as hit_percent , 1 as Vacuum_cost_page_hit,10 as Vacuum_cost_page_miss , 200 as Vacuum_cost_limit ,20 as Vacuum_cost_delay , 20 as Vacuum_cost_page_dirty);
  • 分区表:
    select 
    ((heap_pages * prune_time) + (local_index_pages * prune_time) * nlocalindexs + (gpi_index_pages * prune_time) * npartitions * ngpiindexs + ((heap_pages + local_index_pages + gpi_index_pages) * hit_percent * Vacuum_cost_page_hit + (heap_pages + local_index_pages + gpi_index_pages) * miss_percent * Vacuum_cost_page_miss + (heap_pages + local_index_pages + gpi_index_pages) * dirty_percent * Vacuum_cost_page_dirty) * Vacuum_cost_delay / Vacuum_cost_limit) as total_time,
     dead_tuples/total_time as speed
     from
     (select relpages as heap_pages, b.n_dead_tup as dead_tuples, b.n_dead_tup/(reltuples+b.n_dead_tup+1) as dirty_percent from pg_class a, pg_stat_all_tables b where a.relname = 't1' and b.relname = a.relname) as abc , 
     (select sum(relpages) as local_index_pages, 
     count(*) as nlocalindexs 
     from 
     pg_class where oid in (select indexrelid from pg_index where indrelid=(select oid from pg_class where relname = 't1')) and relkind='i') as bcd , (select sum(relpages) as gpi_index_pages, count(*) as ngpiindexs  from pg_class where oid in (select indexrelid from pg_index where indrelid=(select oid from pg_class where relname = 't1')) and relkind='I') as fenqu , (select count(*) as npartitions from pg_partition where parentid in (select oid from pg_class where relname = 't1'))as qqq ,(select 0.07 as prune_time, 0.05 as miss_percent , 0.95 as hit_percent , 1 as Vacuum_cost_page_hit,10 as Vacuum_cost_page_miss , 200 as Vacuum_cost_limit ,20 as Vacuum_cost_delay , 20 as Vacuum_cost_page_dirty) as yyy;
  1. 公式中prune_time为清理一个tuple需要的时间,这个值与硬件I/O能力相关,在SSD硬盘可以按照0.07ms进行估算。对于IO能力更强的NVMe硬盘,可以适当降低该值;而对于I/O能力较弱的机械硬盘,则需要适当提高;
  2. miss_percent和hit_percent是缓存命中情况。在刚导入数据的场景,缓存命中较低;在系统稳定后,可以按照miss 0.5和hit 0.95进行估算;
  3. vacuum_cost_delay是GUC参数autovacuum_vacuum_cost_delay的值。

除此之外,在当前版本中,我们引入了如下的进度分析视图,可以获取当前的AUTOVACUUM进度以及效率。

gaussdb=# select * from gs_stat_progress_vacuum(140123447490304);
       pid       | dataname | relid | relname |     phase     |
         detail
-----------------+----------+-------+---------+---------------+---------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------
 140123447490304 | postgres | 24602 | t1      | scanning heap | vacuum db "postgres" starts at 2025-02-09 14:10:17.863838-05, vacuumed 12/242 relations, current r
elation "t1"; heap: scanned 55298/73216 blocks, skipped 0 blocks; info: found 0 dead tuples, already run 610s, delay 608s.
(1 row)