Updated on 2024-11-29 GMT+08:00

Enabling Kafka High Reliability

Scenario

To execute the CDL data synchronization tasks listed in Table 1, enable the Kafka high reliability function to prevent data loss when Kafka is faulty or restarted.

Table 1 CDL tasks that use MRS Kafka to synchronize data

Data Source

Destination

Description

MySQL

Hudi

Synchronizes data from MySQL to Hudi.

Kafka

Synchronizes data from MySQL to Kafka.

PgSQL

Hudi

Synchronizes data from PgSQL to Hudi.

Kafka

Synchronizes data from PostgreSQL to Kafka.

Hudi

GaussDB(DWS)

Synchronizes data from Hudi to GaussDB(DWS).

ClickHouse

Synchronizes data from Hudi to ClickHouse.

ThirdKafka

Hudi

Synchronizes data from ThirdKafka to Hudi.

Kafka

Synchronizes data from ThirdKafka to Kafka.

Opengauss

ThirdKafka (DMS/DRS) -> Hudi

Synchronizes data from openGauss to Hudi through ThirdKafka (DMS/DRS).

Hudi

Synchronizes data from openGauss to Hudi.

Kafka

Synchronizes data from openGauss to Kafka.

ThirdKafka supports only drs-opengauss-json, drs-oracle-json, drs-oracle-avro, ogg-oracle-avro and debezium-json data sources.

Prerequisites

  • The CDL component has been installed in an MRS cluster and is running properly.
  • CDL data synchronization tasks use the Kafka component.

Procedure

  1. Log in to FusionInsight Manager and choose Cluster > Services > Kafka. Click Configurations then All Configurations.
  2. Search for the parameters listed in Table 2 in the search box in the upper right corner and change their values.

    Table 2 Modifying Kafka parameters

    Parameter

    Recommended Value

    Description

    unclean.leader.election.enable

    false

    Whether a replica that is not in the ISR can be elected as the leader. If this parameter is set to true, data may be lost.

    min.insync.replicas

    2

    Minimum number of replicas to which data is written when offsets.commit.required.acks is set to -1.

  3. Click Save.
  4. Choose Dashboard, click More, and select Rolling-restart Service to roll-restart Kafka.