Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
Help Center/ Cloud Search Service/ Best Practices/ Elasticsearch Data Migration/ Migrating Data Between Elasticsearch Clusters Using the Reindex API

Migrating Data Between Elasticsearch Clusters Using the Reindex API

Updated on 2024-11-22 GMT+08:00

You can use the reindex API to migrate data between Elasticsearch clusters.

Scenario

As an open-source search engine, Elasticsearch provides the reindex API to support index migration between clusters. This API is also provided in CSS to support data migration between Elasticsearch clusters. Below are some scenarios where you might use the reindex API for data migration.
  • Cluster merging: The reindex API can be used to merge index data scattered across multiple Elasticsearch clusters into a single cluster for centralized data management and analysis.
  • Cloud migration: Migrate an on-premises Elasticsearch service to the cloud to enjoy the benefits of cloud services, such as scalability, ease-of-maintenance, and cost-effectiveness.
  • Changing the service provider: If you are using a third-party Elasticsearch service, and you want to switch to Huawei Cloud or another service provider for some reason (cost, performance, or other concerns), you can use the reindex API for data migration.
The reindex API supports the following:
  • Full migration: Migrate the full amount of index data between clusters. During the migration, all writes to the source cluster must be stopped, ensuring data consistency between the source and destination clusters.
  • Incremental migration: For indexes that have a timestamp field, the reindex API can be used to execute an incremental migration based on this field. During the workload switchover phase, after the full migration is completed, you must stop all writes to the source cluster, and then use the reindex API to execute a quick incremental migration based on the latest update time. Then you can finally switch all services to the destination cluster.
  • Reorganizing indexes: The reindex API can be used to restructure indexes while migrating them, including changing mappings, analyzers, and sharding.

Overview

Figure 1 Migration procedure

Figure 1 shows how to migrate data between Elasticsearch clusters using the reindex API.

  1. Configure the reindex remote access whitelist in the destination cluster to connect the source and destination clusters.
  2. Use the reindex API to migrate indexes from the source cluster to the destination cluster.

Advantages

  • Easy to use: As a built-in function of Elasticsearch, the reindex API offers an easy way to migrate data without complex settings or additional tools.
  • Flexible data processing: Indexes can be restructured or rebuilt during migration, such as changing mappings, analyzers, and sharding.
  • Performance control: During the migration, you can tune the parameters of the scroll API to control the data migration speed for optimal cluster performance.

Impact on Performance

Using the reindex API for data migration between clusters relies on the scroll API. The scroll API can efficiently retrieve index data from the source cluster and synchronize the data to the destination cluster in batches. This process may impact the performance of the source cluster. The specific impact depends on how fast data is retrieved from the source cluster, and the data retrieval speed depends on the size and slice settings of the scroll API. For details, see the Reindex API document.

Reindex tasks are asynchronous in Elasticsearch clusters, so their impact on the performance of the source cluster is manageable when task concurrency is low. If the source cluster has a high resource usage, it is advisable to tune the size parameter of the scroll API to slow down the data retrieval speed or perform the migration during off-peak hours, reducing impact on the performance of the source cluster.

Constraints

  • During cluster migration, do not add, delete, or modify the index data of the source cluster. Otherwise, the data in the source cluster will be inconsistent with that in the destination cluster after the migration.
  • The source and destination clusters must use the same version.

Prerequisites

  • The source and destination Elasticsearch clusters are available.
  • The network between the clusters is connected.
    • If the source and destination clusters are in different VPCs, establish a VPC peering connection between them. For details, see VPC Peering Connection Overview.
    • To migrate an in-house built Elasticsearch cluster to Huawei Cloud, you can configure public network access for this cluster.
    • To migrate a third-party Elasticsearch cluster to Huawei Cloud, you need to establish a VPN or Direct Connect connection between the third party's internal data center and Huawei Cloud.
  • Ensure that _source has been enabled for indexes in the cluster.

    By default, _source is enabled. You can run the GET {index}/_search command to check whether it is enabled. If the returned index information contains _source, it is enabled.

Obtaining Information About the Source Elasticsearch Cluster

Before data migration, obtain necessary information about the source cluster for configuring a migration task.

Table 1 Required information about the source Elasticsearch cluster

Cluster Type

Required Information

How to Obtain

Huawei Cloud Elasticsearch cluster

  • Access address of the source cluster
  • Username and password for accessing the source cluster (only for security-mode clusters)
  • Index structure
  • For details about how to obtain the cluster name and address, see 3.
  • Contact the service administrator to obtain the username and password.
  • For details about how to query the index structure, see 6.

In-house built Elasticsearch cluster

  • Public network address of the source cluster
  • Username and password for accessing the source cluster (only for security-mode clusters)
  • Index structure

Contact the service administrator to obtain the information.

Third-party Elasticsearch cluster

  • Access address of the source cluster
  • Username and password for accessing the source cluster (only for security-mode clusters)
  • Index structure

Contact the service administrator to obtain the information.

The method of obtaining the cluster information varies depending on the source cluster. This section describes how to obtain information about a Huawei Cloud Elasticsearch cluster.

  1. Log in to the CSS management console.
  2. In the navigation pane on the left, choose Clusters > Elasticsearch.
  3. In the Elasticsearch cluster list, obtain the cluster name and address.
    Figure 2 Obtaining cluster information
  4. Click Access Kibana in the Operation column to log in to the Kibana console.
  5. Click Dev Tools in the navigation tree on the left.
  6. Run the following command to query the index structure of the source cluster:
    GET {index_name}

    index_name indicates the name of the index to be migrated.

Configuring the Reindex Remote Access Whitelist

In the destination Elasticsearch cluster, configure the reindex whitelist.
  1. Log in to the CSS management console.
  2. In the navigation pane on the left, choose Clusters > Elasticsearch.
  3. In the Elasticsearch cluster list, click the destination cluster. The cluster information page is displayed.
  4. In the navigation pane on the left, click Parameter Configurations. Then click Edit, and expand Reindexing.

    If the source Elasticsearch cluster uses HTTPS, expand Customize, and add a custom parameter to ignore SSL authentication.

    • Parameter: reindex.ssl.verification_mode
    • Value: none
  5. Click Submit. In the displayed confirmation dialog box, confirm the parameter settings, select I understand that the modification will take effect after the cluster is restarted, then click OK.
  6. Return to the Elasticsearch cluster list, and locate the destination cluster. Choose More > Restart in the Operation column to restart the cluster and make the change take effect.

Migrating Indexes Using the Reindex API

  1. In the destination cluster, create an index structure identical to that in the source cluster.
    1. Log in to the CSS management console.
    2. In the navigation pane on the left, choose Clusters > Elasticsearch.
    3. In the Elasticsearch cluster list, locate the destination cluster, and click Access Kibana in the Operation column to log in to the Kibana console.
    4. Click Dev Tools in the navigation tree on the left.
    5. Run the following command to create an index structure that is identical to that in the source cluster:
      PUT {index_name}
      {
      Index structure of the source cluster
      }

      index_name indicates the index name after the migration. For the index structure of the source cluster, see Obtaining Information About the Source Elasticsearch Cluster.

  2. Run the following command to migrate data using the reindex API:
    • Full migration: Migrate the full amount of index data in the source cluster to the destination cluster.

      On the Kibana console of the destination cluster, run the following command:

      POST _reindex?wait_for_completion=false
      {
        "source": {
           "remote": {
            "host": "http://xx.xx.xx.xx:9200",    //Address of the source cluster. If the source cluster uses HTTPS, use https://xx.xx.xx.xx:9200.
            "username": "xxx",    //Username for accessing the source cluster. It is needed for a security-mode cluster only.
            "password": "******"    //Password for accessing the source cluster. It is needed for a security-mode cluster only.
          },
          "index": "index_name",    //Index name in the source cluster
          "size": 3000
        },
        "dest": {
          "index": "index_name"    //Index name in the destination cluster
        }
      }
    • Incremental migration: Migrate new/changed index data from the source cluster to the destination cluster based on timestamps. This method can also be used to migrate an oversized index one chunk at a time.

      On the Kibana console of the destination cluster, run the following command:

      POST _reindex?wait_for_completion=false
      {
        "source": {
          "remote": {
            "host": "http://xx.xx.xx.xx:9200",    //Address of the source cluster. If the source cluster uses HTTPS, use https://xx.xx.xx.xx:9200.
            "username": "xxx",    //Username for accessing the source cluster. It is needed for a security-mode cluster only.
            "password": "******"    //Password for accessing the source cluster. It is needed for a security-mode cluster only.
          },
          "query": {
            "range" : {
              "timestamps" : {    //The time field
                "gte" : "xxx",    //Start time of the incremental data.
                "lte" : "xxx"    //End time of the incremental data.
              }
            }
          },
          "index": "index_name",    //Index name in the source cluster
          "size": 3000
        },
        "dest": {
          "index": "index_name"    //Index name in the destination cluster
        }
      }
    • Index reorganization in the same cluster: Use the reindex API to restructure indexes during migration.

      On the Kibana console of the destination cluster, run the following command:

      POST _reindex?wait_for_completion=false
      {
        "source": {
          "index": "index_name",    //Index name in the source cluster
          "size": 3000
        },
        "dest": {
          "index": "index_name"    //Index name in the destination cluster
        }
      }

FAQ: What Do I Do If It Is Slow to Migrate an Oversized Index?

It may take a long time to migrate an oversize index. To speed up the migration, use the following methods:

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback