Updated on 2024-07-04 GMT+08:00

Overview

Description

In GaussDB:

  • Data is periodically synchronized to heterogeneous databases (such as Oracle databases) using a data migration tool. Real-time data replication is not supported. Therefore, the requirements for real-time data synchronization to heterogeneous databases are not satisfied.
  • For details about data synchronization for dual-cluster GaussDB DR, see "Server Tools > SyncDataToStby.py" in Tool Reference. The standby cluster requires that the numbers of CNs and DNs and the instance deployment mode be consistent with those in the primary cluster. When the standby cluster is restored, read and write operations cannot be performed, and replication latency is relatively high.

Based on the above two points, GaussDB provides the logical decoding function to generate logical logs by decoding Xlogs. A target database parses logical logs to replicate data in real time. For details, see Figure 1. Logical replication reduces the restrictions on target databases, allowing for data synchronization between heterogeneous databases and homogeneous databases with different forms. It allows data to be read and written during data synchronization on a target database, reducing the data synchronization latency.

Figure 1 Logical replication

Logical replication consists of logical decoding and data replication. Logical decoding outputs logical logs by transaction. The database service or middleware parses the logical logs to implement data replication. Currently, GaussDB supports only logical decoding. Therefore, this section involves only logical decoding.

Logical decoding provides basic transaction decoding capabilities for logical replication. GaussDB uses SQL functions for logical decoding. This method features easy function calling, requires no tools to obtain logical logs, and provides specific APIs for interconnecting with external replay tools, saving the need of additional adaptation.

Logical logs are output only after transactions are committed because they use transactions as the unit and logical decoding is driven by users. Therefore, to prevent Xlogs from being recycled by the system when transactions start and prevent required transaction information from being recycled by VACUUM, GaussDB introduces logical replication slots to block Xlog recycling.

A logical replication slot means a stream of changes that can be replayed in other clusters in the order they were generated in the original cluster. Each owner of logical logs maintains one logical replication slot. If the database where the logical replication slot in streaming decoding resides does not have services, the replication slot is updated based on the log location of other databases. The LSN-based logical replication slot in the active state may be updated based on the LSN of the current log when processing the active transaction snapshot log. The CSN-based logical replication slot in the active state may be updated based on the CSN of the current log when processing the virtual transaction log.

Prerequisites

  • Logical logs are extracted from DNs. If logical replication is required, ensure that the GUC parameter ssl on DNs is set to on.

    For security purposes, ensure that SSL connections are enabled.

  • The GUC parameter wal_level is set to logical.
  • The GUC parameter max_replication_slots is set to a value greater than or equal to the number of physical streaming replication slots, backup slots, and logical replication slots required by each DN.

    Physical streaming replication slots provide an automatic method to ensure that Xlogs are not removed from a primary DN before they are received by all the standby DNs. That is, physical replication slots are used to support HA clusters. The number of physical replication slots required by a cluster is equal to the ratio of standby to the primary DN in a ring of DNs. For example, if the cluster has one primary DN and three standby DNs, three physical replication slots are required.

    Plan the number of logical replication slots as follows:
    • A logical replication slot can carry changes of only one database for decoding. If multiple databases are involved, create multiple logical replication slots.
    • If logical replication is needed by multiple target databases, create multiple logical replication slots in the source database. Each logical replication slot corresponds to one logical replication link.
    • A maximum of 20 logical replication slots can be enabled for decoding on the same instance.
  • A user needs to connect to a database through a DN port before using SQL functions to perform logical decoding. If a CN port is used to connect to the database, EXECUTE DIRECT ON (datanode_name) 'statement' is needed to execute SQL functions.
  • Only initial users and users with the REPLICATION permission can perform this operation. When separation of duties is disabled, database administrators can perform logical replication operations. When separation of duties is enabled, database administrators are not allowed to perform logical replication operations.

Precautions

  • DDL statement decoding is not supported. When a specific DDL statement (for example, to truncate an ordinary table or exchange a partitioned table) is executed, decoded data may be lost.
  • Decoding is not supported for data page replication.
  • After a DDL statement (for example, ALTER TABLE) is executed, the physical logs that are not decoded before the DDL statement execution may be lost.
  • Online cluster scale-out is not allowed during logical decoding.
  • The size of a single tuple cannot exceed 1 GB, and decoding results may be larger than inserted data. Therefore, it is recommended that the size of a single tuple be less than or equal to 500 MB.
  • Decoding compressed tables into DML statements is not supported.
  • GaussDB supports the following types of data to be decoded: INTEGER, BIGINT, SMALLINT, TINYINT, SERIAL, SMALLSERIAL, BIGSERIAL, FLOAT, DOUBLE PRECISION, BOOLEAN, BIT(n), BIT VARYING(n), DATE, TIME[WITHOUT TIME ZONE], TIMESTAMP[WITHOUT TIME ZONE], CHAR(n), VARCHAR(n), TEXT, and CLOB (decoded into the text format).
  • If the SSL connection is required, ensure that the GUC parameter ssl is set to on.
  • The logical replication slot name must contain fewer than 64 characters and contain only one or more types of the following characters: lowercase letters, digits, and underscores (_).
  • After the database where a logical replication slot resides is deleted, the replication slot becomes unavailable and needs to be manually deleted.
  • Interval partitioned tables cannot be replicated.
  • To decode multiple databases, you need to create a streaming replication slot in each database and start decoding. Logs need to be scanned for decoding of each database.
  • Forcible switchover is not supported. After forcible switchover, you need to export all data again.
  • After a DDL statement is executed in a transaction, the DDL statement and subsequent statements are not decoded.
  • During decoding on the standby node, the decoded data may increase due to switchover or failover, which needs to be manually filtered out. When the quorum protocol is used, switchover and failover should be performed on the standby node that is to be promoted to primary, and logs must be synchronized from the primary node to the standby node.
  • The same replication slot for decoding cannot be used between the primary node and standby node or between different standby nodes at the same time. Otherwise, data inconsistency occurs.
  • Replication slots can only be created or deleted on hosts.
  • After the database is restarted due to a fault or the logical replication process is restarted, duplicate decoded data may exist. You need to filter out the duplicate data.
  • If the computer kernel is faulty, garbled characters may be displayed during decoding, which need to be manually or automatically filtered out.
  • Ensure that the long transaction is not started during the creation of the logical replication slot. If the long transaction is started, the creation of the logical replication slot will be blocked. If the creation of a replication slot is blocked due to a long transaction, you can use the SQL function pg_terminate_backend (ID of the thread that creates the replication slot) to manually stop the creation.
  • To parse the UPDATE and DELETE statements of an Astore table, you need to configure the REPLICA IDENTITY attribute for the table. If the table does not have a primary key, set the REPLICA IDENTITY attribute to FULL. Otherwise, modified rows are not identified in the decoding result of the UPDATE and DELETE statements. For details, see the REPLICA IDENTITY { DEFAULT | USING INDEX index_name | FULL | NOTHING } field in "SQL Reference > SQL Syntax > ALTER TABLE" in Developer Guide.
  • Do not perform operations on the replication slot on other nodes when the logical replication slot is in use. To delete a replication slot, stop decoding in the replication slot first.
  • Considering that the target database may require the system status information of the source database, logical decoding automatically filters only logical logs of system catalogs whose OIDs are less than 16384 in the pg_catalog and pg_toast schemas. If the target database does not need to copy the content of other related system catalogs, the related system catalogs need to be filtered during logical log replay.
  • When logical replication is enabled, if you need to create a primary key index that contains system columns, you must set the REPLICA IDENTITY attribute of the table to FULL or use USING INDEX to specify a unique, non-local, non-deferrable index that does not contain system columns and contains only columns marked NOT NULL.
  • If a replication table exists before scale-in or upgrade, you need to manually set the logical_repl_node attribute or reset to the default value for the replication table. For details, see the usage of the storage_parameter parameter and the logical_repl_node attribute in "SQL Reference > SQL Syntax > ALTER TABLE" in Developer Guide.
  • If a transaction has too many sub-transactions, too many files are flushed to disks. To exit decoding, you need to run the SQL function pg_terminate_backend (walsender thread ID for logical decoding) to manually stop decoding. In addition, the exit delay increases by about 1 minute per 300,000 sub-transactions. Therefore, when logical decoding is enabled, if the number of sub-transactions of a transaction reaches 50,000, a WARNING log is generated.
  • When a logical replication slot is inactive, GUC parameters enable_xlog_prune is set to on, enable_logicalrepl_xlog_prune is set to on, and max_size_for_xlog_retention is set to a non-zero value, the number of retained log segments caused by the backup slot or logical replication slot exceeds the value of wal_keep_segments, and other replication slots do not cause more retained log segments, if the value of max_size_for_xlog_retention is greater than 0 and the number of retained log segments (the size of each log segment is 16 MB) caused by the current logical replication slot exceeds the value of max_size_for_xlog_retention, or if the value of max_size_for_xlog_retention is less than 0 and the disk usage reaches the value of –max_size_for_xlog_retention/100, the logical replication slot is forcibly invalidated and restart_lsn is set to FFFFFFFF/FFFFFFFF. Logical replication slots in this state do not participate in the recycling of blocked logs or historical system catalogs, but the limitation on the maximum number of replication slots still takes effect. In this case, you need to manually delete them.
  • After the standby node starts decoding and sends an instruction of updating the replication slot number to the primary node, the standby node occupies a corresponding logical replication slot (identified as an active state) on the primary node. Before that, the corresponding logical replication slot on the primary node is inactive. In this state, if the condition for forcibly invalidating the logical replication slot is met, the logical replication slot is marked as invalid (that is, restart_lsn is set to FFFFFFFF/FFFFFFFF). As a result, the standby node cannot update the replication slot on the primary node. In addition, after the standby node replays the logs indicating that the replication slot is invalid, the standby node of the current replication slot cannot be reconnected if decoding is disconnected.
  • Inactive logical replication slots block WAL recycling and historical system catalog tuple clearing. As a result, disk logs are accumulated and system catalog scanning performance deteriorates. Therefore, you need to clear logical replication slots that are no longer used in time. During the observation period before the upgrade is committed, the extended IP address of the DN is used to connect to the logical replication slot created on the DN. Before the upgrade rollback, manually clear the logical replication slot. Otherwise, the DN cannot be directly connected to clear the logical replication slot when the extended IP address feature of the DN is rolled back.
  • Logical decoding with strong consistency in a distributed system (with CNs connected) supports only GTM-lite distributed deployment and streaming decoding. It does not support CNs connecting to standby DNs for decoding, SQL logical decoding functions, online scale-out, or global indexes.
  • For logical decoding with strong consistency in a distributed system (with CNs connected), the CN HA is switched by the service.
  • The xmin, catalog_xmin, restart_lsn, confirmed_flush, and confirmed_csn columns of the CSN-based logical replication slots on CNs are not displayed because the CSN-based logical replication slots function only as placeholders and do not move with logical decoding or block log recycling.
  • If a protocol is used to connect to a CN to create a logical replication slot, only CSN-based replication slots are supported. If a protocol is used to connect to a DN to create a logical replication slot, only LSN-based replication slots are supported.
  • For distributed decoding, if an error is reported or the decoding client is manually stopped, wait for 15 seconds and try decoding again. If a replication slot is occupied, run the command pg_terminate_backend(ID of the thread that occupies the replication slot) to manually release the replication slot.
  • If an error is reported when a replication slot fails to be created on a CN, delete the replication slot on the CN and create a replication slot on the CN again.
  • When a logical replication slot is deleted from a CN, if the logical replication slot is an LSN-based logical replication slot, only the replication slot of the current node is deleted. Logical replication slots with the same name on other nodes are not affected. When a CSN-based logical replication slot with the same name exists on other nodes, no error is reported because some nodes do not have replication slots. In addition, replication slots with the same name on all nodes are successfully deleted. If no replication slot exists on any node, an error is reported.
  • When a CSN-based logical replication slot is created on a CN, if there are residual LSN-based logical replication slots with the same name on some nodes, you need to delete the residual replication slots on these nodes. Otherwise, CSN-based logical replication slots will be created on CNs and primary DNs that do not have replication slots with the same name except the current CN.
  • If an LSN-based logical replication slot remains on the current CN and a CSN-based logical replication slot with the same name remains on other nodes, deleting the replication slot on the current CN will delete only the local LSN-based logical replication slot. After the deletion is complete, perform the deletion operation again to delete the replication slots with the same name on other nodes.
  • When the JSON format is used for decoding, the data column cannot contain special characters (such as the null character '\0'). Otherwise, the content in the decoding output column will be truncated.
  • When a transaction generates a large number of sub-transactions that need to be flushed to disks, the number of opened file handles may exceed the upper limit. In this case, set max_files_per_process to a value greater than twice the upper limit of sub-transactions.
  • sql_decoding decodes the UPDATE statement as a "DELETE+INSERT" operation.

Performance

In the Benchmarksql-5.0 with 100 warehouses, when pg_logical_slot_get_changes is used:
  • If 4000 lines of data (about 5 MB to 10 MB logs) are decoded at a time, the decoding performance ranges from 0.3 MB/s to 0.5 MB/s.
  • If 32000 lines of data (about 40 MB to 80 MB logs) are decoded at a time, the decoding performance ranges from 3 MB/s to 5 MB/s.
  • If 256000 lines of data (about 320 MB to 640 MB logs) are decoded at a time, the decoding performance ranges from 3 MB/s to 5 MB/s.
  • If the amount of data to be decoded at a time still increases, the decoding performance is not significantly improved.

If pg_logical_slot_peek_changes and pg_replication_slot_advance are used, the decoding performance is 30% to 50% lower than that when pg_logical_slot_get_changes is used.