Updated on 2024-05-24 GMT+08:00

Replicating Kafka Instance Data

Scenario

Create a Smart Connect task to copy data unidirectionally or bidirectionally between two Kafka instances.

  • If you have enabled Smart Connect for an instance before July 1, 2022 and Kafka data replication is not available, disable Smart Connect and then enable it again.
  • This function is unavailable for single-node instances.
  • Data in the source Kafka instance is synchronized to the target Kafka instance in real time.

Restrictions

  • A maximum of 18 Smart Connect tasks can be created for an instance.
  • When you copy Kafka data, the two Kafka instances must be connected through the intranet. If they are in different VPCs, connect the network by referring to Accessing Kafka Across VPCs Using VPCEP.
  • After a Smart Connect task is created, task parameters cannot be modified.

Prerequisites

  • You have enabled Smart Connect.
  • A Kafka instance has been created and is in the Running state.
  • A topic has been created.

Replicating Kafka Instance Data

  1. Log in to the console.
  2. Click in the upper left corner to select a region.

    Select the region where your Kafka instance is located.

  3. Click in the upper left corner and choose Middleware > Distributed Message Service (for Kafka) to open the console of DMS for Kafka.
  4. Click the desired Kafka instance to view its details.
  5. In the navigation pane, choose Smart Connect.
  6. On the displayed page, click Create Task.
  7. For Task Name, enter a unique Smart Connect task name.
  8. For Task Type, select Copy Kafka data.
  9. For Start Immediately, specify whether to execute the task immediately after the task is created. By default, the task is executed immediately. If you disable this option, you can enable it later in the task list.
  10. In the Current kafka area, set the instance alias.

    The instance alias is used in the following scenarios:

    • If you enable Rename Topics, the alias of the source instance will be added to the topic names of the target instance. For example, if the alias of the source instance is A and the target topic name is test, the renamed target topic will be A.test.
    • After the Smart Connect task is created, a topic named mm2-offset-syncs.Target instance alias.internal is automatically created. If Sync Consumer Offset is enabled for the task, a topic named Target instance alias.checkpoints.internal is automatically created. The two topics are used to store internal data. If they are deleted, data replication will fail.

  11. In the Peer Kafka area, configure the following parameters.

    Table 1 Peer Kafka parameters

    Parameter

    Description

    Instance Alias

    Set the instance alias.

    The instance alias is used in the following scenarios:

    • If you enable Rename Topics, the alias of the source instance will be added to the topic names of the target instance. For example, if the alias of the source instance is A and the target topic name is test, the renamed target topic will be A.test.
    • After the Smart Connect task is created, a topic named mm2-offset-syncs.Target instance alias.internal is automatically created. If Sync Consumer Offset is enabled for the task, a topic named Target instance alias.checkpoints.internal is automatically created. The two topics are used to store internal data. If they are deleted, data replication will fail.

    Config Type

    Options:

    • Kafka address: Enter Kafka instance addresses.
    • Instance name: Select an existing Kafka instance.

    Instance Name

    Set this parameter when Config Type is set to Instance name.

    Select an existing Kafka instance from the drop-down list.

    The peer Kafka instance and the current Kafka instance must be in the same VPC. Otherwise, they cannot be identified.

    Kafka Address

    Set this parameter when Config Type is set to Kafka address.

    Enter the IP addresses and port numbers for connecting to the Kafka instance.

    When you copy Kafka data, the two Kafka instances must be connected through the intranet. If they are in different VPCs, connect the network by referring to Accessing Kafka Across VPCs Using VPCEP.

    Authentication

    Options:

    • SASL_SSL: The Kafka instance has enabled SASL_SSL, clients can connect to it using SASL authentication and the data will be encrypted using the SSL certificate.
    • SASL_PLAINTEXT: The Kafka instance has enabled SASL_PLAINTEXT, clients can connect to it using SASL authentication and the data will be transmitted in plaintext.
    • PLAINTEXT: The instance is not using authentication.

    Authentication Mechanism

    Set this parameter when Authentication is set to SASL_SSL/SASL_PLAINTEXT.

    • SCRAM-SHA-512: uses the hash algorithm to generate credentials for usernames and passwords to verify identities. SCRAM-SHA-512 is more secure than PLAIN.
    • PLAIN: a simple username and password verification mechanism.

    Username

    Set this parameter when Authentication is set to SASL_SSL/SASL_PLAINTEXT.

    Set in instance creation or user creation.

    Password

    Set this parameter when Authentication is set to SASL_SSL/SASL_PLAINTEXT.

    Set in instance creation or user creation.

  12. In the Rules area, configure the following parameters.

    Table 2 Parameters for configuring data replication rules

    Parameter

    Description

    Sync Direction

    There are three synchronization directions:

    • Pull: Replicates data from the peer Kafka instance to the current Kafka instance.
    • Push: Replicates data from the current Kafka instance to the peer Kafka instance.
    • Both: Bidirectional replication of Kafka instance data on both ends.

    Topics

    Specify the topics whose data is to be replicated.

    • Regular expression: Use a regular expression to match topics.
    • Enter/Select: Enter topic names. To enter multiple topic names, press Enter after entering each topic name. You can also select topics from the drop-down list. A maximum of 20 topics can be entered or selected.
    NOTE:

    Data of topics whose names end with "internal" (for example, topic.internal) will not be synchronized.

    Tasks

    Number of data replication tasks. The default value is 2. You are advised to use the default value.

    If Sync Direction is set to Both, the actual number of tasks will be twice the number of tasks you configure here.

    Rename Topics

    Add the alias of the source Kafka instance before the target topic name to form a new name of the target topic. For example, if the alias of the source instance is A and the target topic name is test, the renamed target topic will be A.test.

    If you select Both for Sync Direction, enable Rename Topics to prevent infinite replication.

    Add Source Header

    The target topic receives the replicated messages. The message header contains the message source.

    If you select Both for Sync Direction, Add Source Header is enabled by default to prevent infinite replication.

    Sync Consumer Offset

    Enable this option to synchronize the consumer offset to the target Kafka instance.

    NOTICE:

    After enabling Sync Consumer Offset, pay attention to the following:

    • The source and target Kafka instances cannot consume messages at the same time. Otherwise, the synchronized consumer offset will be abnormal.
    • The consumer offset is synchronized every minute. As a result, the consumer offset on the target end may be slightly smaller than that on the source end, and some messages are repeatedly consumed. The service logic of the consumer client must be able to handle repeated consumption.
    • The offset synchronized from the source end is not the same as the offset on the target end. Instead, there is a mapping relationship. If the consumer offset is maintained by the consumer client, the consumer client does not obtain the consumer offset from the target Kafka instance after switching consumption from the source Kafka instance to the target Kafka instance. As a result, the offset may be incorrect or the consumer offset may be reset.

    Replicas

    Number of topic replicas when a topic is automatically created in the peer instance. The value of this parameter cannot exceed the number of brokers in the peer instance.

    This parameter takes precedence over the default.replication.factor parameter set in the peer instance.

    Start Offset

    Options:

    • Minimum offset: dumping the earliest data
    • Maximum offset: dumping the latest data

    Compression

    Compression algorithm to use for copying messages.

    Topic Mapping

    Customize the target topic name.

    Maximum mappings: 20. Rename Topic and Topic Mapping cannot be configured at the same time.

    • When creating a bidirectional replication task, you must enable Rename Topics or Add Source Header to prevent infinite replication. If you specify the same topic for a pull task and a push task between two instances (forming bidirectional replication), and Rename Topics and Add Source Header are not enabled for the two tasks, data will be replicated infinitely.
    • If you create two or more tasks with the same configuration and enable Sync Consumer Offset for them, data will be repeatedly replicated and the consumer offset of the target topic will be abnormal.
    Figure 1 Configuring data replication rules

  13. (Optional) In the lower right corner of the page, click Check to test the connectivity between the Kafka instances.

    If "Connectivity check passed." is displayed, the Kafka instances are connected.

  14. Click Create. The Smart Connect task list page is displayed. The message "Task xxx was created successfully." is displayed in the upper right corner of the page.

    After the Smart Connect task is created, a topic named mm2-offset-syncs.Target instance alias.internal is automatically created. If Sync Consumer Offset is enabled for the task, a topic named Target instance alias.checkpoints.internal is automatically created. The two topics are used to store internal data. If they are deleted, data replication will fail.