Updated on 2023-03-30 GMT+08:00

Distributed Transactions

Background

In the distributed share nothing architecture, table data is distributed on different nodes. One or more statements on the client may modify data on multiple nodes at the same time. In this case, distributed transactions are generated. Pay attention to the following key points of distributed transactions:

1) Atomicity of transactions on each node: Distributed transactions are either all successful or all failed on all nodes.

2) Transaction consistency: The data returned on each node is the same. Data consistency cannot be ensured when a node is faulty.

The atomicity of distributed transactions must be ensured. The consistency of distributed transactions depends on the CAP theory. The common standards are CP systems that support strong consistency 2pc and 3pc protocols or AP systems that support eventual consistency TCC and message tables.

Technical Principles

GaussDB(DWS) supports strongly consistent distributed transactions and provides high-availability with consistency and partition tolerance (CP).

CSN

CSN update and mapping between CSNs and XIDs

CSN – Commit Sequence Number: transaction submission number

The CSN is an 8-byte unsigned integer that increases monotonically and is maintained by the GTM.

2) When a transaction ends, the CSN value is updated from the GTM.

3) After the CSN mechanism is used, the CSN can be obtained from the GTM.

GTM

GTM is a component in the GaussDB(DWS) distributed framework. It has the following functions:

1. Manage and allocate transaction IDs (increases but does not decrease).

2. Manage and maintain CSN numbers (increase but not decrease)

When executing a modification operation, the CN obtains the transaction ID from the GTM.

At the beginning of the statement, the CN obtains a CSN from the GTM for query.

When a transaction starts or ends, the CN communicates with the GTM to register and destroy transaction information.

Troubleshooting

The gs_clean tool is used to automatically clear residual distributed transactions caused by node faults.

gs_clean queries the residual two-phase transactions on each node, checks whether the transactions are submitted or rolled back on other nodes based on the residual transaction IDs, and clears the residual two-phase transactions based on the final result.

Benefits

GaussDB(DWS) supports strongly consistent distributed transactions. You can use the GaussDB(DWS) database in the same way as using a standalone database. After the CSN-based transaction mechanism is used, the concurrency performance is greatly improved.