Replicating Kafka Instance Data
Create a Smart Connect task to copy data unidirectionally or bidirectionally between two Kafka instances.
- If you have enabled Smart Connect for an instance before July 1, 2022 and Kafka data replication is not available, disable Smart Connect and then enable it again.
- Data in the source Kafka instance is synchronized to the target Kafka instance in real time.
Notes and Constraints
- This function is unavailable for single-node Kafka instances.
- A maximum of 18 Smart Connect tasks can be created for an instance.
- When you copy Kafka data, the two Kafka instances must be connected through the intranet. If they are in different VPCs, connect the network by referring to Accessing Kafka Using a VPC Endpoint Across VPCs or VPC Peering Connection.
- After a Smart Connect task is created, task parameters cannot be modified.
Prerequisites
- You have enabled Smart Connect.
- A Kafka instance has been created and is in the Running state.
Procedure
- Log in to the console.
- Click in the upper left corner to select a region.
Select the region where your Kafka instance is located.
- Click in the upper left corner and choose Middleware > Distributed Message Service (for Kafka) to open the console of DMS for Kafka.
- Click the desired Kafka instance to view its details.
- In the navigation pane, choose Smart Connect.
- On the displayed page, click Create Task.
- For Task Name, enter a unique Smart Connect task name. Naming rules: 4–64 characters and only letters, digits, hyphens (-), or underscores (_).
- For Task Type, select Copy Kafka data.
- For Start Immediately, specify whether to execute the task immediately after the task is created. By default, the task is executed immediately. If you disable this option, you can enable it later in the task list.
- In the Current Kafka area, set the instance alias. Naming rules: 1–20 characters and only letters, digits, hyphens (-), or underscores (_).
The instance alias is used in the following scenarios:
- If you enable Rename Topics and select Push or Both for Sync Direction, the alias of the current Kafka instance will be added to the topic names of the peer end Kafka instance. For example, if the alias of the current Kafka instance is A and the topic name of the peer end Kafka instance is test, the renamed topic will be A.test.
- After a Smart Connect task of Kafka data replication is created, a topic named mm2-offset-syncs.peer end Kafka instance alias.internal is generated for the current Kafka instance. If the task has Sync Consumer Offset enabled and uses Pull or Both for Sync Direction, a topic named peer end Kafka instance alias.checkpoints.internal is also created for the current Kafka instance. The two topics are used to store internal data. If they are deleted, data replication will fail.
- In the Peer Kafka area, configure the following parameters.
Table 1 Peer Kafka parameters Parameter
Description
Instance Alias
Naming rules: 1–20 characters and only letters, digits, hyphens (-), or underscores (_).
The instance alias is used in the following scenarios:
- If you enable Rename Topics and select Pull or Both for Sync Direction, the alias of the peer end Kafka instance will be added to the topic names of the current Kafka instance. For example, if the alias of the peer end Kafka instance is B and the topic name of the current Kafka instance is test01, the renamed topic will be B.test01.
- After a Smart Connect task of Kafka data replication is created, if the task has Sync Consumer Offset enabled and uses Push or Both for Sync Direction, a topic named current Kafka instance alias.checkpoints.internal is also created for the peer end Kafka instance. This topic is used to store internal data. If it is deleted, data replication will fail.
Config Type
Options:
- Kafka address: Enter Kafka instance addresses.
- Instance name: Select an existing Kafka instance.
Instance Name
Set this parameter when Config Type is set to Instance name.
Select an existing Kafka instance from the drop-down list.
The peer Kafka instance and the current Kafka instance must be in the same VPC. Otherwise, they cannot be identified.
Kafka Address
Set this parameter when Config Type is set to Kafka address.
Enter the IP addresses and port numbers for connecting to the Kafka instance.
When you copy Kafka data, the two Kafka instances must be connected through the intranet. If they are in different VPCs, connect the network by referring to Accessing Kafka Using a VPC Endpoint Across VPCs or VPC Peering Connection.
Authentication
Options:
- SASL_SSL: The Kafka instance has enabled SASL_SSL, clients can connect to it with SASL and the data will be encrypted using the SSL certificate.
- SASL_PLAINTEXT: The Kafka instance has enabled SASL_PLAINTEXT, clients can connect to it with SASL and the data will be transmitted in plaintext.
- PLAINTEXT: The instance is not using authentication.
Authentication Mechanism
Set this parameter when Authentication is set to SASL_SSL/SASL_PLAINTEXT.
- SCRAM-SHA-512: uses the hash algorithm to generate credentials for usernames and passwords to verify identities. SCRAM-SHA-512 is more secure than PLAIN.
- PLAIN: a simple username and password verification mechanism.
Username
Set this parameter when Authentication is set to SASL_SSL/SASL_PLAINTEXT.
Set in instance creation or user creation.
Password
Set this parameter when Authentication is set to SASL_SSL/SASL_PLAINTEXT.
Set in instance creation or user creation.
After a Smart Connect task is created, modifying the authentication method or mechanism, or password of the peer end instance causes the synchronization task to fail. In this case, delete the current task and create another one.
- In the Rules area, configure the following parameters.
Table 2 Parameters for configuring data replication rules Parameter
Description
Sync Direction
There are three synchronization directions:
- Pull: Replicates data from the peer Kafka instance to the current Kafka instance.
- Push: Replicates data from the current Kafka instance to the peer Kafka instance.
- Both: Bidirectional replication of Kafka instance data on both ends.
Topics
Specify the topics whose data is to be replicated.
- Regular expression: Use a regular expression to match topics.
- Enter/Select: Enter topic names. To enter multiple topic names, press Enter after entering each topic name. You can also select topics from the drop-down list. A maximum of 20 topics can be entered or selected.
NOTE:Data of topics whose names end with "internal" (for example, topic.internal) will not be synchronized.
Tasks
Number of data replication tasks. The default value is 2. You are advised to use the default value.
If Sync Direction is set to Both, the actual number of tasks will be twice the number of tasks you configure here.
Rename Topics
Add the alias of the source Kafka instance before the target topic name to form a new name of the target topic. For example, if the alias of the source instance is A and the target topic name is test, the renamed target topic will be A.test.
If you select Both for Sync Direction, enable Rename Topics to prevent infinite replication.
Add Source Header
The target topic receives the replicated messages. The message header contains the message source.
If you select Both for Sync Direction, Add Source Header is enabled by default to prevent infinite replication.
Sync Consumer Offset
Enable this option to synchronize the consumer offset to the target Kafka instance.
NOTICE:After enabling Sync Consumer Offset, pay attention to the following:
- The source and target Kafka instances cannot consume messages at the same time. Otherwise, the synchronized consumer offset will be abnormal.
- The consumer offset is synchronized every minute. As a result, the consumer offset on the target end may be slightly smaller than that on the source end, and some messages are repeatedly consumed. The service logic of the consumer client must be able to handle repeated consumption.
- The offset synchronized from the source end is not the same as the offset on the target end. Instead, there is a mapping relationship. If the consumer offset is maintained by the consumer client, the consumer client does not obtain the consumer offset from the target Kafka instance after switching consumption from the source Kafka instance to the target Kafka instance. As a result, the offset may be incorrect or the consumer offset may be reset.
Replicas
Number of topic replicas when a topic is automatically created in the peer instance. The value of this parameter cannot exceed the number of brokers in the peer instance.
This parameter takes precedence over the default.replication.factor parameter set in the peer instance.
Start Offset
Options:
- Minimum offset: dumping the earliest data
- Maximum offset: dumping the latest data
Compression
Compression algorithm to use for copying messages.
Topic Mapping
Customize the target topic name.
Maximum mappings: 20. Rename Topic and Topic Mapping cannot be configured at the same time.
- When creating a bidirectional replication task, you must enable Rename Topics or Add Source Header to prevent infinite replication. If you specify the same topic for a pull task and a push task between two instances (forming bidirectional replication), and Rename Topics and Add Source Header are not enabled for the two tasks, data will be replicated infinitely.
- If you create two or more tasks with the same configuration and enable Sync Consumer Offset for them, data will be repeatedly replicated and the consumer offset of the target topic will be abnormal.
Figure 1 Configuring data replication rules
- (Optional) In the lower right corner of the page, click Check to test the connectivity between the Kafka instances.
If "Connectivity check passed." is displayed, the Kafka instances are connected.
- Click Create. The Smart Connect task list page is displayed. The message "Task xxx was created successfully." is displayed in the upper right corner of the page.
- After a Smart Connect task of Kafka data replication is created, a topic named mm2-offset-syncs.peer end Kafka instance alias.internal is generated for the current Kafka instance. If the task has Sync Consumer Offset enabled and uses Pull or Both for Sync Direction, a topic named peer end Kafka instance alias.checkpoints.internal is also created for the current Kafka instance. The two topics are used to store internal data. If they are deleted, data replication will fail.
- After a Smart Connect task of Kafka data replication is created, if the task has Sync Consumer Offset enabled and uses Push or Both for Sync Direction, a topic named current Kafka instance alias.checkpoints.internal is also created for the peer end Kafka instance. This topic is used to store internal data. If it is deleted, data replication will fail.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot