Updated on 2023-06-20 GMT+08:00

Migration from Kafka/MQ

Process

In industries dealing with a large amount of data, such as IoT, news, public opinion analysis, and social networking, message middleware such as Kafka and MQ is used to balance traffic in peak and off-peak hours. The tools such as Flink and Logstash are then used to consume data, preprocess data, and import data to the search engine, providing the search service for external systems.

The following figure shows the process of migrating data from a Kafka or MQ cluster.

Figure 1 Migration from a Kafka or MQ cluster

This migration solution is convenient and flexible.

  • Convenient: Once the data of the two ES clusters becomes consistent, a cutover can be performed at any time.
  • Flexible: Data can be added, deleted, modified, and queried on both sides.

Procedure

  1. Subscribe to incremental data. Create a consumer group in Kafka or MQ, and subscribe to incremental data.
  2. Synchronize inventory data. Use a tool such as Logstash to migrate data from the source Elasticsearch cluster to the CSS cluster. If Logstash is used for data migration, see Migrating Cluster Data Using Logstash.
  3. Synchronize incremental data. After the inventory data is synchronized, enable the incremental consumer group. Based on the idempotence of Elasticsearch operations on data, when the new consumer group catches up with the previous consumer group, the data on both sides will be consistent.

    For log migration, data in the source Elasticsearch cluster does not need to be migrated, and you can skip the inventory data synchronization. After the incremental data synchronization is complete, synchronize the data for a period of time (for example, three or seven days), and then directly perform cutover.