Migrating Data Between Elasticsearch Clusters Using the Reindex API
You can use the reindex API to migrate data between Elasticsearch clusters.
Scenario
- Cluster merging: The reindex API can be used to merge index data scattered across multiple Elasticsearch clusters into a single cluster for centralized data management and analysis.
- Cloud migration: Migrate an on-premises Elasticsearch service to the cloud to enjoy the benefits of cloud services, such as scalability, ease-of-maintenance, and cost-effectiveness.
- Changing the service provider: If you are using a third-party Elasticsearch service, and you want to switch to Huawei Cloud or another service provider for some reason (cost, performance, or other concerns), you can use the reindex API for data migration.
- Full migration: Migrate the full amount of index data between clusters. During the migration, all writes to the source cluster must be stopped, ensuring data consistency between the source and destination clusters.
- Incremental migration: For indexes that have a timestamp field, the reindex API can be used to execute an incremental migration based on this field. During the workload switchover phase, after the full migration is completed, you must stop all writes to the source cluster, and then use the reindex API to execute a quick incremental migration based on the latest update time. Then you can finally switch all services to the destination cluster.
- Reorganizing indexes: The reindex API can be used to restructure indexes while migrating them, including changing mappings, analyzers, and sharding.
Overview
Figure 1 shows how to migrate data between Elasticsearch clusters using the reindex API.
- Configure the reindex remote access whitelist in the destination cluster to connect the source and destination clusters.
- Use the reindex API to migrate indexes from the source cluster to the destination cluster.
Advantages
- Easy to use: As a built-in function of Elasticsearch, the reindex API offers an easy way to migrate data without complex settings or additional tools.
- Flexible data processing: Indexes can be restructured or rebuilt during migration, such as changing mappings, analyzers, and sharding.
- Performance control: During the migration, you can tune the parameters of the scroll API to control the data migration speed for optimal cluster performance.
Impact on Performance
Using the reindex API for data migration between clusters relies on the scroll API. The scroll API can efficiently retrieve index data from the source cluster and synchronize the data to the destination cluster in batches. This process may impact the performance of the source cluster. The specific impact depends on how fast data is retrieved from the source cluster, and the data retrieval speed depends on the size and slice settings of the scroll API. For details, see the Reindex API document.
Reindex tasks are asynchronous in Elasticsearch clusters, so their impact on the performance of the source cluster is manageable when task concurrency is low. If the source cluster has a high resource usage, it is advisable to tune the size parameter of the scroll API to slow down the data retrieval speed or perform the migration during off-peak hours, reducing impact on the performance of the source cluster.
Constraints
- During cluster migration, do not add, delete, or modify the index data of the source cluster. Otherwise, the data in the source cluster will be inconsistent with that in the destination cluster after the migration.
- The source and destination clusters must use the same version.
Prerequisites
- The source and destination Elasticsearch clusters are available.
- The network between the clusters is connected.
- If the source and destination clusters are in different VPCs, establish a VPC peering connection between them. For details, see VPC Peering Connection Overview.
- To migrate an in-house built Elasticsearch cluster to Huawei Cloud, you can configure public network access for this cluster.
- To migrate a third-party Elasticsearch cluster to Huawei Cloud, you need to establish a VPN or Direct Connect connection between the third party's internal data center and Huawei Cloud.
- Ensure that _source has been enabled for indexes in the cluster.
By default, _source is enabled. You can run the GET {index}/_search command to check whether it is enabled. If the returned index information contains _source, it is enabled.
Obtaining Information About the Source Elasticsearch Cluster
Before data migration, obtain necessary information about the source cluster for configuring a migration task.
Cluster Type |
Required Information |
How to Obtain |
---|---|---|
Huawei Cloud Elasticsearch cluster |
|
|
In-house built Elasticsearch cluster |
|
Contact the service administrator to obtain the information. |
Third-party Elasticsearch cluster |
|
Contact the service administrator to obtain the information. |
The method of obtaining the cluster information varies depending on the source cluster. This section describes how to obtain information about a Huawei Cloud Elasticsearch cluster.
- Log in to the CSS management console.
- In the navigation pane on the left, choose Clusters > Elasticsearch.
- In the Elasticsearch cluster list, obtain the cluster name and address.
Figure 2 Obtaining cluster information
- Click Access Kibana in the Operation column to log in to the Kibana console.
- Click Dev Tools in the navigation tree on the left.
- Run the following command to query the index structure of the source cluster:
GET {index_name}
index_name indicates the name of the index to be migrated.
Configuring the Reindex Remote Access Whitelist
- Log in to the CSS management console.
- In the navigation pane on the left, choose Clusters > Elasticsearch.
- In the Elasticsearch cluster list, click the destination cluster. The cluster information page is displayed.
- In the navigation pane on the left, click Parameter Configurations. Then click Edit, and expand Reindexing.
- Set reindex.remote.whitelist.
- Value: Enter the address of the source cluster obtained in Obtaining Information About the Source Elasticsearch Cluster.
If the source Elasticsearch cluster uses HTTPS, expand Customize, and add a custom parameter to ignore SSL authentication.
- Parameter: reindex.ssl.verification_mode
- Value: none
- Click Submit. In the displayed confirmation dialog box, confirm the parameter settings, select I understand that the modification will take effect after the cluster is restarted, then click OK.
- Return to the Elasticsearch cluster list, and locate the destination cluster. Choose More > Restart in the Operation column to restart the cluster and make the change take effect.
Migrating Indexes Using the Reindex API
- In the destination cluster, create an index structure identical to that in the source cluster.
- Log in to the CSS management console.
- In the navigation pane on the left, choose Clusters > Elasticsearch.
- In the Elasticsearch cluster list, locate the destination cluster, and click Access Kibana in the Operation column to log in to the Kibana console.
- Click Dev Tools in the navigation tree on the left.
- Run the following command to create an index structure that is identical to that in the source cluster:
PUT {index_name} { Index structure of the source cluster }
index_name indicates the index name after the migration. For the index structure of the source cluster, see Obtaining Information About the Source Elasticsearch Cluster.
- Run the following command to migrate data using the reindex API:
- Full migration: Migrate the full amount of index data in the source cluster to the destination cluster.
On the Kibana console of the destination cluster, run the following command:
POST _reindex?wait_for_completion=false { "source": { "remote": { "host": "http://xx.xx.xx.xx:9200", //Address of the source cluster. If the source cluster uses HTTPS, use https://xx.xx.xx.xx:9200. "username": "xxx", //Username for accessing the source cluster. It is needed for a security-mode cluster only. "password": "******" //Password for accessing the source cluster. It is needed for a security-mode cluster only. }, "index": "index_name", //Index name in the source cluster "size": 3000 }, "dest": { "index": "index_name" //Index name in the destination cluster } }
- Incremental migration: Migrate new/changed index data from the source cluster to the destination cluster based on timestamps. This method can also be used to migrate an oversized index one chunk at a time.
On the Kibana console of the destination cluster, run the following command:
POST _reindex?wait_for_completion=false { "source": { "remote": { "host": "http://xx.xx.xx.xx:9200", //Address of the source cluster. If the source cluster uses HTTPS, use https://xx.xx.xx.xx:9200. "username": "xxx", //Username for accessing the source cluster. It is needed for a security-mode cluster only. "password": "******" //Password for accessing the source cluster. It is needed for a security-mode cluster only. }, "query": { "range" : { "timestamps" : { //The time field "gte" : "xxx", //Start time of the incremental data. "lte" : "xxx" //End time of the incremental data. } } }, "index": "index_name", //Index name in the source cluster "size": 3000 }, "dest": { "index": "index_name" //Index name in the destination cluster } }
- Index reorganization in the same cluster: Use the reindex API to restructure indexes during migration.
On the Kibana console of the destination cluster, run the following command:
POST _reindex?wait_for_completion=false { "source": { "index": "index_name", //Index name in the source cluster "size": 3000 }, "dest": { "index": "index_name" //Index name in the destination cluster } }
- Full migration: Migrate the full amount of index data in the source cluster to the destination cluster.
FAQ: What Do I Do If It Is Slow to Migrate an Oversized Index?
It may take a long time to migrate an oversize index. To speed up the migration, use the following methods:
- The reindex API relies on the scroll API to read data from the source cluster and write data into the destination cluster. You can set the size and slice parameters of the scroll API to increase migration concurrency and speed. For details, see Reindex API.
- Before the migration, set the number of replicas to 0 for the destination index. After the migration is completed, change the number of replicas to the original value.
- Use snapshots to migrate large amounts of data. See Migrating Data Between Huawei Cloud Elasticsearch Clusters Using Backup and Restoration, Migrating Data from an On-premises Elasticsearch Cluster to Huawei Cloud Using the S3 Plugin, and Migrating Data from a Third-Party Elasticsearch Cluster to Huawei Cloud Using Backup and Restoration for examples.
- If the index has a time field, use incremental migration to migrate the index one chunk at a time.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot