Updated on 2024-11-05 GMT+08:00

Subscribing to Hybrid Data Warehouse Binlog

Binlog Usage

The HStore table within the GaussDB(DWS) hybrid data warehouse offers a Binlog feature to facilitate the capture of database events. This enables the export of incremental data to third-party systems like Flink. By consuming Binlog data, you can synchronize upstream and downstream data, improving data processing efficiency.

Unlike traditional MySQL Binlog, which logs all database changes and focuses on data recovery and replication. The GaussDB(DWS) hybrid data warehouse Binlog is optimized for real-time data synchronization, recording DML operations—Insert, Delete, Update, and Upsert—while excluding DDL operations.

Constraints and Limitations

  1. Binlog is currently in beta and is only available for commercial use on a restricted basis. To use Binlog, contact technical support.
  2. Binlog recording for HStore and HStore-op operations is only supported in versions 8.3.0.100 and later.
  3. Binlog requires a primary key, an HStore or HStore-opt table, and supports only hash distribution.
  4. Binlog tables log DML operations like insert, delete, and update (upsert), excluding DDL operations.
  5. Binlog tables do not support insert overwrite, altering distribution columns, enabling Binlog on temporary tables, or partition operations like exchange, merge, or split.
  6. Users can perform certain DDL operations (ADD COLUMN, DROP COLUMN, SET TYPE, VACUUM FULL, TRUNCATE),

    but these will reset incremental data and synchronization details.

  7. The system waits for Binlog consumption before further scaling. The default wait duration is 1 hour, which can be set through binlog_consume_timeout. Timeouts or errors will fail the scaling process.
  8. Binlog tables are backed up as standard HStore tables. Post-restoration, you must restart data synchronization as incremental data and sync details are reset.

Binlog Formats and Principles

Table 1 binlog fields

Field

Type

Meaning

sync_point

BIGINT

Binlog system field, which indicates the synchronization point. In common GTM mode, the value is unique and ordered.

event_sequence

BIGINT

Binlog system field, which indicates the sequence of operations of the same transaction type.

type

CHAR

Binlog system field, which indicates the operation type of the current record.

The options are as follows:

  • I refers to INSERT, indicating that a new record is inserted into the current Binlog.
  • d refers to DELETE, indicating that a record is deleted from the current Binlog.
  • B refers to BEFORE_UPDATE, indicating that the current Binlog is a record before the update.
  • U refers to AFTER_UPDATE, indicating that the current Binlog is a record after the update.

user_column_1

User Column

User-defined data column

...

...

...

usert_column_n

User Column

User-defined data column

  • For each UPDATE (or UPSERT-triggered update), two Binlog types—BEFORE_UPDATE and AFTER_UPDATE—are created. The BEFORE_UPDATE Binlog verifies the accuracy of data processed by external components like Flink.
  • During UPDATE and DELETE operations, the GaussDB(DWS) hybrid data warehouse generates BEFORE_UPDATE and DELETE Binlogs without querying or populating all user columns, enhancing database import efficiency.
  • Enabling Binlog for an HStore table in the GaussDB(DWS) hybrid data warehouse is in fact the process of creation of a supplementary table. This table includes three system columns sync_point, event_sequence, and type, and a value column that serializes all user columns.

Enabling Binlog

You can specify the table-level parameter enable_binlog when creating an HStore table to enable the binlog function.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
CREATE TABLE hstore_binlog_source (
    c1  INT PRIMARY KEY,
    c2  INT,
    c3  INT
) WITH (
    ORIENTATION = COLUMN, 
    enable_hstore=true, 
    enable_binlog=on,
    binlog_ttl = 86400
);
  • Binlog recording starts only after a synchronization point is registered for the task, not during the initial data import. (Once Flink is activated, it cyclically acquires the synchronization point and incremental data.)
  • The binlog_ttl parameter is optional, defaulting to 86,400 seconds. If a registered synchronization point not undergo incremental synchronization within this TTL, it will be cleared. Subsequently, Binlogs before the oldest synchronization point are asynchronously deleted to free up space.

Run the ALTER command to enable the binlog function for an existing HStore table.

1
2
3
4
5
6
7
8
9
CREATE TABLE hstore_binlog_source (
    c1  INT PRIMARY KEY,
    c2  INT,
    c3  INT
) WITH (
    ORIENTATION = COLUMN, 
    enable_hstore=true
);
ALTER TABLE hstore_binlog_source SET (enable_binlog=on);