Replicating Kafka Instance Data

Create a Smart Connect task to copy data unidirectionally or bidirectionally between two Kafka instances.

Data in the source Kafka instance is synchronized to the target Kafka instance in real time.

Notes and Constraints

This function is unavailable for single-node Kafka instances.
A maximum of 18 Smart Connect tasks can be created for an instance.
When you copy Kafka data, the two Kafka instances must be connected through the intranet. If they are in different VPCs, connect the network by referring to Accessing Kafka Using a VPC Endpoint Across VPCs or VPC Peering Connection.
After a Smart Connect task is created, task parameters cannot be modified.
If you have enabled Smart Connect for an instance before July 1, 2022 and Kafka data replication is not available, disable Smart Connect and then enable it again.
Data can be synchronized only when Max. Message Size of the target topic is greater than or equal to 524,288 bytes. If no topic is available in the target Kafka instance, a topic will be automatically created with the same Max. Message Size as that of the source Kafka instance topic during data synchronization. In this case, ensure the Max. Message Size to be used is greater than or equal to 524,288 bytes. To modify Max. Message Size, see Modifying Kafka Topic Configurations.

Prerequisites

You have enabled Smart Connect.
A Kafka instance has been created and is in the Running state.

Procedure

Log in to the console.
Click in the upper left corner to select the region where your instance is located.
Click in the upper left corner and choose Middleware > Distributed Message Service (for Kafka) to open the console of DMS for Kafka.
In the navigation pane, choose Kafka Instances.
Click the desired instance to go to the instance details page.
In the navigation pane, choose Smart Connect.
On the displayed page, click Create Task.
For Task Name, enter a unique Smart Connect task name. Naming rules: 4–64 characters and only letters, digits, hyphens (-), or underscores (_).
For Task Type, select Copy Kafka data.
For Start Immediately, specify whether to execute the task immediately after the task is created. By default, the task is executed immediately. If you disable this option, you can enable it later in the task list.
In the Current Kafka area, set the instance alias. Naming rules: 1–20 characters and only letters, digits, hyphens (-), or underscores (_).

The instance alias is used in the following scenarios:
- If you enable Rename Topics and select Push or Both for Sync Direction, the alias of the current Kafka instance will be added to the topic names of the peer end Kafka instance. For example, if the alias of the current Kafka instance is A and the topic name of the peer end Kafka instance is test, the renamed topic will be A.test.
- After a Smart Connect task of Kafka data replication is created, a topic named mm2-offset-syncs.peer end Kafka instance alias.internal is generated for the current Kafka instance. If the task has Sync Consumer Offset enabled and uses Pull or Both for Sync Direction, a topic named peer end Kafka instance alias.checkpoints.internal is also created for the current Kafka instance. The two topics are used to store internal data. If they are deleted, data replication will fail.

In the Peer Kafka area, configure the following parameters.

**Table 1** Peer Kafka parameters
Parameter	Description
Instance Alias	Naming rules: 1–20 characters and only letters, digits, hyphens (-), or underscores (_). The instance alias is used in the following scenarios: If you enable Rename Topics and select Pull or Both for Sync Direction, the alias of the peer end Kafka instance will be added to the topic names of the current Kafka instance. For example, if the alias of the peer end Kafka instance is B and the topic name of the current Kafka instance is test01, the renamed topic will be B.test01. After a Smart Connect task of Kafka data replication is created, if the task has Sync Consumer Offset enabled and uses Push or Both for Sync Direction, a topic named current Kafka instance alias.checkpoints.internal is also created for the peer end Kafka instance. This topic is used to store internal data. If it is deleted, data replication will fail.
Config Type	Options: Kafka address: Enter Kafka instance addresses. To replicate data to a target Kafka instance in another VPC, use this type. Instance name: Select an existing Kafka instance. To replicate data to a target Kafka instance in the same VPC, use this type.
Instance Name	Mandatory when Instance name is used for Config Type and the Kafka instances are within a VPC. Select an existing Kafka instance from the drop-down list.
Kafka Address	Set this parameter when Config Type is set to Kafka address. Enter the IP addresses and port numbers for connecting to the Kafka instance. When you copy Kafka data, the two Kafka instances must be connected through the intranet. If they are in different VPCs, connect the network by referring to Accessing Kafka Using a VPC Endpoint Across VPCs or VPC Peering Connection.
Authentication	Options: SASL_SSL: The Kafka instance has enabled SASL_SSL, clients can connect to it with SASL and the data will be encrypted using the SSL certificate. SASL_PLAINTEXT: The Kafka instance has enabled SASL_PLAINTEXT, clients can connect to it with SASL and the data will be transmitted in plaintext. PLAINTEXT: The instance is not using authentication.
Authentication Mechanism	Set this parameter when Authentication is set to SASL_SSL/SASL_PLAINTEXT. SCRAM-SHA-512: uses the hash algorithm to generate credentials for usernames and passwords to verify identities. SCRAM-SHA-512 is more secure than PLAIN. PLAIN: a simple username and password verification mechanism.
Username	Set this parameter when Authentication is set to SASL_SSL/SASL_PLAINTEXT. Set in instance creation or user creation.
Password	Set this parameter when Authentication is set to SASL_SSL/SASL_PLAINTEXT. Set in instance creation or user creation.

After a Smart Connect task is created, modifying the authentication method or mechanism, or password of the peer end instance causes the synchronization task to fail. In this case, delete the current Smart Connect task and create another one.

In the Rules area, configure the following parameters.

**Table 2** Parameters for configuring data replication rules
Parameter	Description
Sync Direction	There are three synchronization directions: Pull: Replicates data from the peer Kafka instance to the current Kafka instance. Push: Replicates data from the current Kafka instance to the peer Kafka instance. Both: Bidirectional replication of Kafka instance data on both ends.
Topics	Specify the topics whose data is to be replicated. Regular expression: Use a regular expression to match topics. Enter/Select: Enter topic names. To enter multiple topic names, press Enter after entering each topic name. You can also select topics from the drop-down list. A maximum of 20 topics can be entered or selected. Data of topics whose names end with "internal" (for example, topic.internal) will not be synchronized.
Tasks	Number of data replication tasks. The default value is 2. You are advised to use the default value. If Sync Direction is set to Both, the actual number of tasks will be twice the number of tasks you configure here.
Rename Topics	Add the alias of the source Kafka instance before the target topic name to form a new name of the target topic. For example, if the alias of the source instance is A and the target topic name is test, the renamed target topic will be A.test. If you select Both for Sync Direction, enable Rename Topics to prevent infinite replication.
Add Source Header	The target topic receives the replicated messages. The message header contains the message source. If you select Both for Sync Direction, Add Source Header is enabled by default to prevent infinite replication.
Sync Consumer Offset	Enable this option to synchronize the consumer offset to the target Kafka instance. After enabling Sync Consumer Offset, pay attention to the following: The source and target Kafka instances cannot consume messages at the same time. Otherwise, the synchronized consumer offset will be abnormal. The consumer offset is synchronized every minute. As a result, the consumer offset on the target end may be slightly smaller than that on the source end, and some messages are repeatedly consumed. The service logic of the consumer client must be able to handle repeated consumption. The offset synchronized from the source end is not the same as the offset on the target end. Instead, there is a mapping relationship. If the consumer offset is maintained by the consumer client, the consumer client does not obtain the consumer offset from the target Kafka instance after switching consumption from the source Kafka instance to the target Kafka instance. As a result, the offset may be incorrect or the consumer offset may be reset.
Replicas	Number of topic replicas when a topic is automatically created in the peer instance. The value of this parameter cannot exceed the number of brokers in the peer instance. This parameter takes precedence over the default.replication.factor parameter set in the peer instance.
Start Offset	Options: Minimum offset: dumping the earliest data Maximum offset: dumping the latest data
Compression	Compression algorithm to use for copying messages.
Topic Mapping	Customize the target topic name. Maximum mappings: 20. Rename Topic and Topic Mapping cannot be configured at the same time.

Precautions:

When creating a bidirectional replication task, you must enable Rename Topics or Add Source Header to prevent infinite replication. If you specify the same topic for a pull task and a push task between two instances (forming bidirectional replication), and Rename Topics and Add Source Header are not enabled for the two tasks, data will be replicated infinitely.
If you create two or more tasks with the same configuration and enable Sync Consumer Offset for them, data will be repeatedly replicated and the consumer offset of the target topic will be abnormal.

Figure 1 Configuring data replication rules

(Optional) In the lower right corner of the page, click Check to test the connectivity between the Kafka instances.

If "Connectivity check passed." is displayed, the Kafka instances are connected.
Click Create. The Smart Connect task list page is displayed. The message "Task xxx was created successfully." is displayed in the upper right corner of the page.

After a Smart Connect task of Kafka data replication is created, Kafka automatically creates the following topics:
- A topic named mm2-offset-syncs.peer end Kafka instance alias.internal is generated for the current Kafka instance. If the task has Sync Consumer Offset enabled and uses Pull or Both for Sync Direction, a topic named peer end Kafka instance alias.checkpoints.internal is also created for the current Kafka instance. The two topics are used to store internal data. If they are deleted, data replication will fail.
- After a Smart Connect task of Kafka data replication is created, if the task has Sync Consumer Offset enabled and uses Push or Both for Sync Direction, a topic named current Kafka instance alias.checkpoints.internal is also created for the peer end Kafka instance. This topic is used to store internal data. If it is deleted, data replication will fail.