Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
Help Center/ Cloud Search Service/ Best Practices/ Elasticsearch Data Migration/ Migrating Data Between Elasticsearch Clusters Using ESM

Migrating Data Between Elasticsearch Clusters Using ESM

Updated on 2024-11-22 GMT+08:00

Scenario

The Elasticsearch Migration Tool (ESM) is an open-source tool designed for migrating data between Elasticsearch clusters, including between those of different versions. When using ESM to migrate data, you can adjust the migration speed by tuning the parameters of the scroll API to accommodate various network conditions and service requirements. Below are some scenarios where you might use ESM for data migration.

  • Cross-version data migration: Use ESM to seamlessly migrate data when upgrading an Elasticsearch cluster to a new version, ensuring data integrity and availability during and after the upgrade.
  • Cluster merging: Merge index data scattered across multiple Elasticsearch clusters into a single cluster for centralized data management and analysis.
  • Cloud migration: Migrate an on-premises Elasticsearch service to the cloud to enjoy the benefits of cloud services, such as scalability, ease-of-maintenance, and cost-effectiveness.
  • Changing the service provider: Switch from a third-party Elasticsearch service to Huawei Cloud for reasons related to cost, performance, or other strategic considerations.

Overview

Figure 1 Migration procedure

Figure 1 shows how to migrate data between Elasticsearch clusters using ESM.

  1. Install ESM on a Linux VM.
  2. Run ESM commands to migrate indexes from the source cluster to the destination cluster.

Advantages

  • Cross-version data migration: You can use ESM to migrate data between Elasticsearch clusters of different versions, including from an earlier version to a later version.
  • Easy to use: ESM is easy to use and is developed using the Go language. It quickly becomes available after being installed using an installation package.
  • Performance control: During the migration, you can tune the parameters of the scroll API to control the data migration speed for optimal cluster performance.
  • Flexible migration options: ESM supports both full migration and incremental migration, accommodating different needs.
  • Free: As an open-source tool, the ESM code is hosted on GitHub and accessible to all users for free.

Impact on Performance

Using ESM for data migration between clusters relies on the scroll API. The scroll API can efficiently retrieve index data from the source cluster and synchronize the data to the destination cluster in batches. This process may impact the performance of the source cluster. The specific impact depends on how fast data is retrieved from the source cluster, and the data retrieval speed depends on the size and slice settings of the scroll API. For details, see the Reindex API document.

ESM can quickly read data from the source cluster, potentially impacting its performance. Therefore, it is advisable to perform the data migration during off-peak hours. Additionally, you may need to monitor changes in the CPU and memory metrics of the source cluster. By tuning the migration speed and choosing an appropriate time window for the migration, you can keep the performance impact under control. If you have large amounts to data to migrate or if the source cluster has a high resource usage, you should perform the migration during off-peak hours, reducing impact on the performance of the source cluster.

Constraints

During cluster migration, do not add, delete, or modify the index data of the source cluster. Otherwise, the migration may fail, or data in the source cluster may be inconsistent with that in the destination cluster after the migration.

Prerequisites

  • The source and destination Elasticsearch clusters are available.
  • The network between the clusters is connected.
    • If the source and destination clusters are in different VPCs, establish a VPC peering connection between them. For details, see VPC Peering Connection Overview.
    • To migrate an in-house built Elasticsearch cluster to Huawei Cloud, you can configure public network access for this cluster.
    • To migrate a third-party Elasticsearch cluster to Huawei Cloud, you need to establish a VPN or Direct Connect connection between the third party's internal data center and Huawei Cloud.
  • Ensure that _source has been enabled for indexes in the cluster.

    By default, _source is enabled. You can run the GET {index}/_search command to check whether it is enabled. If the returned index information contains _source, it is enabled.

Obtaining Elasticsearch Cluster Information

Before data migration, obtain necessary information about the source and destination clusters for configuring a migration task.

Table 1 Required Elasticsearch cluster information

Cluster Type

Required Information

How to Obtain

Source cluster

Huawei Cloud Elasticsearch cluster

  • Access address of the source cluster
  • Username and password for accessing the source cluster (only for security-mode clusters)
  • For details about how to obtain the cluster name and address, see 3.
  • Contact the service administrator to obtain the username and password.

In-house built Elasticsearch cluster

  • Public network address of the source cluster
  • Username and password for accessing the source cluster (only for security-mode clusters)

Contact the service administrator to obtain the information.

Third-party Elasticsearch cluster

  • Access address of the source cluster
  • Username and password for accessing the source cluster (only for security-mode clusters)

Contact the service administrator to obtain the information.

Destination cluster

Huawei Cloud Elasticsearch cluster

  • Access address of the destination cluster
  • Username and password for accessing the destination cluster (only for security-mode clusters)
  • For details about how to obtain the access address, see 3.
  • Contact the service administrator to obtain the username and password.

The method of obtaining the cluster information varies depending on the source cluster. This section describes how to obtain information about a Huawei Cloud Elasticsearch cluster.

  1. Log in to the CSS management console.
  2. In the navigation pane on the left, choose Clusters > Elasticsearch.
  3. In the Elasticsearch cluster list, obtain the cluster name and address.
    Figure 2 Obtaining cluster information

Preparing the VM Used for the Migration

Create an ECS where you install ECM for Elasticsearch cluster migration.

  1. Buy a Linux ECS, select the CentOS 7 image, and set the VPC to that of the destination cluster. For details, see Purchasing and Using a Linux ECS.
  2. Test the connectivity between the ECS and the source and destination clusters.

    Run the following command on the ECS to test connectivity. If the correct cluster information is returned, the connection is ready.

    # Non-security mode cluster
    curl -ik http://ip:9200    
    # Security-mode cluster + HTTPS
    curl -ik https://ip:9200 -u[Username]:[password]

Migrating Data Using ESM

  1. Visit the ESM download address, and download the migrator-linux-amd64 installation package.
  2. Use SCP to upload the downloaded migrator-linux-amd64 to a path on the Linux ECS.
  3. Run the following command in the above path on the Linux ECS to migrate index structures and data from the source cluster to the destination cluster:
    # Full migration
    ./migrator-linux-amd64 -s http://source:9200 -d http://dest:9200 -x index_name -m admin:password -n admin:password --copy_settings --copy_mappings -w 5 -b 10
    
    # Incremental migration
    ./migrator-linux-amd6 -s http://source:9200 -d http://dest:9200 -x index-test -m admin:password -n admin:password -w 5 -b 10 -q "timestamp:[\"2022-01-17 03:41:20\" TO \"2022-01-22 03:41:20\"]"

    For commonly used parameters in the migration command, see Table 2. For even more information, see ESM documents.

    Table 2 Commonly used parameters in ESM migration commands

    Parameter

    Example

    Description

    -s, --source=

    http://source:9200

    Address for accessing the source Elasticsearch cluster

    -d, --dest=

    http://dest:9200

    Address for accessing the destination Elasticsearch cluster

    -x, --src_indexes=

    index_name

    index1,index2

    The names of the indexes in the source cluster waiting to be migrated. A regular expression can be used to specify indexes. Separate multiple indexes using a comma (,).

    -y, --dest_index=

    index_name_rename

    Index name in a destination cluster. You may specify just a single index name. If left blank, the source index names will be used.

    -m, --source_auth=

    admin:password

    Username and password for accessing the source Elasticsearch cluster (only for security-mode clusters)

    -n, --dest_auth=

    admin:password

    Username and password for accessing the destination Elasticsearch cluster (only for security-mode clusters)

    -w, --workers=

    5

    The number of concurrent threads for bulk data reading. This parameter controls how fast data will be read from the source cluster.

    Default value: 1

    -b, --bulk_size=

    10

    The bulk size for data reading. This parameter also controls how fast data will be read from the source cluster.

    Default value: 5 MB

    --sliced_scroll_size

    4

    Size of sliced scroll. This parameter also controls how fast data will be read from the source cluster.

    Default value: 1

    --copy_settings

    -

    Whether to migrate index settings in the source cluster

    --copy_mappings

    -

    Whether to migrate index mappings in the source cluster

    --buffer_count=

    -

    Number of files to be cached in the memory of the VM that hosts ESM.

    Default value: 100,000

  4. After the data migration is completed, check data consistency by comparing the number of files.
    # Non-security mode cluster
    curl -ik http://ip:9200/{index name}/_count
    # Security-mode cluster + HTTPS
    curl -ik https://ip:9200 -u[Username]:[password]/{ index name}/_count

FAQ

  • How do I handle the error message "out of memory" displayed during migration?

    The error message "out of memory", if it is displayed during the migration, indicates a memory overflow on the ESM ECS. Handle it using one of the following methods:

    • Use a larger flavor the ECS. For details, see Modifying Individual ECS Specifications.
    • Reduce the value of buffer_count used in the ESM migration command to reduce the number of files that can be cached in the memory of the ECS.
  • Why is the total size of index data inconsistent in the source and destination clusters after the migration?

    This is normal when ESM is used to migrate Elasticsearch clusters. In a typical Elasticsearch cluster, multiple shards are used to store data, and each shard has multiple segments. After data is migrated from the source cluster to the destination cluster using ESM, new segments and shards are generated in the destination cluster. Different numbers of segments and shards in the source and destination clusters lead to different data expansion rates, hence different data sizes. To check data consistency, compare the number of files, rather than the data size.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback