Migrating Data Between Elasticsearch Clusters Using ESM
Scenario
The Elasticsearch Migration Tool (ESM) is an open-source tool designed for migrating data between Elasticsearch clusters, including between those of different versions. When using ESM to migrate data, you can adjust the migration speed by tuning the parameters of the scroll API to accommodate various network conditions and service requirements. Below are some scenarios where you might use ESM for data migration.
- Cross-version data migration: Use ESM to seamlessly migrate data when upgrading an Elasticsearch cluster to a new version, ensuring data integrity and availability during and after the upgrade.
- Cluster merging: Merge index data scattered across multiple Elasticsearch clusters into a single cluster for centralized data management and analysis.
- Cloud migration: Migrate an on-premises Elasticsearch service to the cloud to enjoy the benefits of cloud services, such as scalability, ease-of-maintenance, and cost-effectiveness.
- Changing the service provider: Switch from a third-party Elasticsearch service to Huawei Cloud for reasons related to cost, performance, or other strategic considerations.
Overview
Figure 1 shows how to migrate data between Elasticsearch clusters using ESM.
- Install ESM on a Linux VM.
- Run ESM commands to migrate indexes from the source cluster to the destination cluster.
Advantages
- Cross-version data migration: You can use ESM to migrate data between Elasticsearch clusters of different versions, including from an earlier version to a later version.
- Easy to use: ESM is easy to use and is developed using the Go language. It quickly becomes available after being installed using an installation package.
- Performance control: During the migration, you can tune the parameters of the scroll API to control the data migration speed for optimal cluster performance.
- Flexible migration options: ESM supports both full migration and incremental migration, accommodating different needs.
- Free: As an open-source tool, the ESM code is hosted on GitHub and accessible to all users for free.
Impact on Performance
Using ESM for data migration between clusters relies on the scroll API. The scroll API can efficiently retrieve index data from the source cluster and synchronize the data to the destination cluster in batches. This process may impact the performance of the source cluster. The specific impact depends on how fast data is retrieved from the source cluster, and the data retrieval speed depends on the size and slice settings of the scroll API. For details, see the Reindex API document.
ESM can quickly read data from the source cluster, potentially impacting its performance. Therefore, it is advisable to perform the data migration during off-peak hours. Additionally, you may need to monitor changes in the CPU and memory metrics of the source cluster. By tuning the migration speed and choosing an appropriate time window for the migration, you can keep the performance impact under control. If you have large amounts to data to migrate or if the source cluster has a high resource usage, you should perform the migration during off-peak hours, reducing impact on the performance of the source cluster.
Constraints
During cluster migration, do not add, delete, or modify the index data of the source cluster. Otherwise, the migration may fail, or data in the source cluster may be inconsistent with that in the destination cluster after the migration.
Prerequisites
- The source and destination Elasticsearch clusters are available.
- The network between the clusters is connected.
- If the source and destination clusters are in different VPCs, establish a VPC peering connection between them. For details, see VPC Peering Connection Overview.
- To migrate an in-house built Elasticsearch cluster to Huawei Cloud, you can configure public network access for this cluster.
- To migrate a third-party Elasticsearch cluster to Huawei Cloud, you need to establish a VPN or Direct Connect connection between the third party's internal data center and Huawei Cloud.
- Ensure that _source has been enabled for indexes in the cluster.
By default, _source is enabled. You can run the GET {index}/_search command to check whether it is enabled. If the returned index information contains _source, it is enabled.
Obtaining Elasticsearch Cluster Information
Before data migration, obtain necessary information about the source and destination clusters for configuring a migration task.
Cluster Type |
Required Information |
How to Obtain |
|
---|---|---|---|
Source cluster |
Huawei Cloud Elasticsearch cluster |
|
|
In-house built Elasticsearch cluster |
|
Contact the service administrator to obtain the information. |
|
Third-party Elasticsearch cluster |
|
Contact the service administrator to obtain the information. |
|
Destination cluster |
Huawei Cloud Elasticsearch cluster |
|
|
The method of obtaining the cluster information varies depending on the source cluster. This section describes how to obtain information about a Huawei Cloud Elasticsearch cluster.
- Log in to the CSS management console.
- In the navigation pane on the left, choose Clusters > Elasticsearch.
- In the Elasticsearch cluster list, obtain the cluster name and address.
Figure 2 Obtaining cluster information
Preparing the VM Used for the Migration
Create an ECS where you install ECM for Elasticsearch cluster migration.
- Buy a Linux ECS, select the CentOS 7 image, and set the VPC to that of the destination cluster. For details, see Purchasing and Using a Linux ECS.
- Test the connectivity between the ECS and the source and destination clusters.
Run the following command on the ECS to test connectivity. If the correct cluster information is returned, the connection is ready.
# Non-security mode cluster curl -ik http://ip:9200 # Security-mode cluster + HTTPS curl -ik https://ip:9200 -u[Username]:[password]
Migrating Data Using ESM
- Visit the ESM download address, and download the migrator-linux-amd64 installation package.
- Use SCP to upload the downloaded migrator-linux-amd64 to a path on the Linux ECS.
- Run the following command in the above path on the Linux ECS to migrate index structures and data from the source cluster to the destination cluster:
# Full migration ./migrator-linux-amd64 -s http://source:9200 -d http://dest:9200 -x index_name -m admin:password -n admin:password --copy_settings --copy_mappings -w 5 -b 10 # Incremental migration ./migrator-linux-amd6 -s http://source:9200 -d http://dest:9200 -x index-test -m admin:password -n admin:password -w 5 -b 10 -q "timestamp:[\"2022-01-17 03:41:20\" TO \"2022-01-22 03:41:20\"]"
For commonly used parameters in the migration command, see Table 2. For even more information, see ESM documents.
Table 2 Commonly used parameters in ESM migration commands Parameter
Example
Description
-s, --source=
http://source:9200
Address for accessing the source Elasticsearch cluster
-d, --dest=
http://dest:9200
Address for accessing the destination Elasticsearch cluster
-x, --src_indexes=
index_name
index1,index2
The names of the indexes in the source cluster waiting to be migrated. A regular expression can be used to specify indexes. Separate multiple indexes using a comma (,).
-y, --dest_index=
index_name_rename
Index name in a destination cluster. You may specify just a single index name. If left blank, the source index names will be used.
-m, --source_auth=
admin:password
Username and password for accessing the source Elasticsearch cluster (only for security-mode clusters)
-n, --dest_auth=
admin:password
Username and password for accessing the destination Elasticsearch cluster (only for security-mode clusters)
-w, --workers=
5
The number of concurrent threads for bulk data reading. This parameter controls how fast data will be read from the source cluster.
Default value: 1
-b, --bulk_size=
10
The bulk size for data reading. This parameter also controls how fast data will be read from the source cluster.
Default value: 5 MB
--sliced_scroll_size
4
Size of sliced scroll. This parameter also controls how fast data will be read from the source cluster.
Default value: 1
--copy_settings
-
Whether to migrate index settings in the source cluster
--copy_mappings
-
Whether to migrate index mappings in the source cluster
--buffer_count=
-
Number of files to be cached in the memory of the VM that hosts ESM.
Default value: 100,000
- After the data migration is completed, check data consistency by comparing the number of files.
# Non-security mode cluster curl -ik http://ip:9200/{index name}/_count # Security-mode cluster + HTTPS curl -ik https://ip:9200 -u[Username]:[password]/{ index name}/_count
FAQ
- How do I handle the error message "out of memory" displayed during migration?
The error message "out of memory", if it is displayed during the migration, indicates a memory overflow on the ESM ECS. Handle it using one of the following methods:
- Use a larger flavor the ECS. For details, see Modifying Individual ECS Specifications.
- Reduce the value of buffer_count used in the ESM migration command to reduce the number of files that can be cached in the memory of the ECS.
- Why is the total size of index data inconsistent in the source and destination clusters after the migration?
This is normal when ESM is used to migrate Elasticsearch clusters. In a typical Elasticsearch cluster, multiple shards are used to store data, and each shard has multiple segments. After data is migrated from the source cluster to the destination cluster using ESM, new segments and shards are generated in the destination cluster. Different numbers of segments and shards in the source and destination clusters lead to different data expansion rates, hence different data sizes. To check data consistency, compare the number of files, rather than the data size.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot