Updated on 2025-05-29 GMT+08:00

oldestxmin Is Stuck Due to Long Transactions

Symptom

VACUUM of Astore can only clean up dead tuples that have been committed and whose transaction IDs are smaller than oldestxmin. If a long transaction is not committed for a long time, oldestxmin cannot be updated. As a result, AUTOVACUUM cannot reclaim dead tuples, causing Astore space bloat.

Fault Locating

To solve this problem, use the gs_txid_oldestxmin() view to query oldestxmin and use the txid_current() view to query the latest XID. Then, you can determine whether oldestxmin is stuck based on the difference between the two views.

select txid_current() as xid, gs_txid_oldestxmin() as oldestxmin, xid - oldestxmin as diff;

Example:

gaussdb=# select txid_current() as xid, gs_txid_oldestxmin() as oldestxmin, xid - oldestxmin as diff;
  xid   | oldestxmin | diff
--------+------------+------
 220790 |     220789 |    1
(1 row)

Generally, a difference of less than 10,000 between the two is considered normal. If the difference is too large, for example, several hundred thousand or even millions, long transactions exist in the system. You can use pg_stat_activity to obtain information about blocked transactions.

select 'elp:'|| now() - xact_start diff, query, datname, pid, usename, xact_start, enqueue, state, query_id, query from pg_stat_activity where query not like 'WLM%' order by diff desc;
select * from pg_running_xacts;

Example:

gaussdb=# select 'elp:'|| now() - xact_start diff, query, datname, pid, usename, xact_start, enqueue, state, query_id, query from pg_stat_activity where query not like 'WLM%' and state='active' order by diff desc;
     diff     |                                                                                                    query
                                                          | datname  |       pid       |  usename  |          xact_start           | enqueue | state  |     query_
id     |                                                                                                    query

--------------+---------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------+----------+-----------------+-----------+-------------------------------+---------+--------+-----------
-------+----------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------
 elp:00:00:00 | select 'elp:'|| now() - xact_start diff, query, datname, pid, usename, xact_start, enqueue, state, query_id, query from pg_stat_activity where que
ry not like 'WLM%' and state='active' order by diff desc; | postgres | 140123736897280 | postgres | 2025-02-09 13:50:21.628494-05 |         | active | 3096224744
018863 | select 'elp:'|| now() - xact_start diff, query, datname, pid, usename, xact_start, enqueue, state, query_id, query from pg_stat_activity where query not
like 'WLM%' and state='active' order by diff desc;
     0 |
 elp:         |
                                                          | postgres | 140124852582144 | postgres |                               |         | active |
     0 |
 elp:         |
                                                          | postgres | 140124747724544 | postgres |                               |         | active |
     0 |
 elp:         |
                                                          | postgres | 140124232349440 | postgres |                               |         | active |
     0 |
(6 rows)

gaussdb=# select * from pg_running_xacts where pid=140125682005760;
 handle |  gxid  | state |  node  | xmin | vacuum | timeline | prepare_xid |       pid       | next_xid | dbid
--------+--------+-------+--------+------+--------+----------+-------------+-----------------+----------+-------
     -1 | 220796 | 0     | sgnode |    0 | f      |       11 |           0 | 140125682005760 |        0 | 12912
(1 row)

In addition, logical replication slots may also block the update of oldestxmin. You can check the replication slot information as follows:

select * from pg_get_replication_slots();

Example:

gaussdb=# select * from pg_get_replication_slots();
 slot_name | plugin | slot_type | datoid | active | xmin | catalog_xmin | restart_lsn | dummy_standby | confirmed_flush | confirmed_csn | dictionary_csn_min | slo
t_dictionary_type
-----------+--------+-----------+--------+--------+------+--------------+-------------+---------------+-----------------+---------------+--------------------+----
------------------
 dn_6003   |        | physical  |      0 | t      |      |              | 0/15E52890  | f             |                 |               |                    |
 dn_6002   |        | physical  |      0 | t      |      |              | 0/15E52890  | f             |                 |               |                    |
(2 rows)

Workarounds

After identifying the active thread that blocks the update of oldestxmin, terminate the corresponding thread.