Help Center/ Cloud Search Service/ Best Practices/ Elasticsearch Data Migration/ Migrating Data Between Elasticsearch Clusters Using the Reindex API
Updated on 2024-11-22 GMT+08:00

Migrating Data Between Elasticsearch Clusters Using the Reindex API

You can use the reindex API to migrate data between Elasticsearch clusters.

Scenario

As an open-source search engine, Elasticsearch provides the reindex API to support index migration between clusters. This API is also provided in CSS to support data migration between Elasticsearch clusters. Below are some scenarios where you might use the reindex API for data migration.
  • Cluster merging: The reindex API can be used to merge index data scattered across multiple Elasticsearch clusters into a single cluster for centralized data management and analysis.
  • Cloud migration: Migrate an on-premises Elasticsearch service to the cloud to enjoy the benefits of cloud services, such as scalability, ease-of-maintenance, and cost-effectiveness.
  • Changing the service provider: If you are using a third-party Elasticsearch service, and you want to switch to Huawei Cloud or another service provider for some reason (cost, performance, or other concerns), you can use the reindex API for data migration.
The reindex API supports the following:
  • Full migration: Migrate the full amount of index data between clusters. During the migration, all writes to the source cluster must be stopped, ensuring data consistency between the source and destination clusters.
  • Incremental migration: For indexes that have a timestamp field, the reindex API can be used to execute an incremental migration based on this field. During the workload switchover phase, after the full migration is completed, you must stop all writes to the source cluster, and then use the reindex API to execute a quick incremental migration based on the latest update time. Then you can finally switch all services to the destination cluster.
  • Reorganizing indexes: The reindex API can be used to restructure indexes while migrating them, including changing mappings, analyzers, and sharding.

Overview

Figure 1 Migration procedure

Figure 1 shows how to migrate data between Elasticsearch clusters using the reindex API.

  1. Configure the reindex remote access whitelist in the destination cluster to connect the source and destination clusters.
  2. Use the reindex API to migrate indexes from the source cluster to the destination cluster.

Advantages

  • Easy to use: As a built-in function of Elasticsearch, the reindex API offers an easy way to migrate data without complex settings or additional tools.
  • Flexible data processing: Indexes can be restructured or rebuilt during migration, such as changing mappings, analyzers, and sharding.
  • Performance control: During the migration, you can tune the parameters of the scroll API to control the data migration speed for optimal cluster performance.

Impact on Performance

Using the reindex API for data migration between clusters relies on the scroll API. The scroll API can efficiently retrieve index data from the source cluster and synchronize the data to the destination cluster in batches. This process may impact the performance of the source cluster. The specific impact depends on how fast data is retrieved from the source cluster, and the data retrieval speed depends on the size and slice settings of the scroll API. For details, see the Reindex API document.

Reindex tasks are asynchronous in Elasticsearch clusters, so their impact on the performance of the source cluster is manageable when task concurrency is low. If the source cluster has a high resource usage, it is advisable to tune the size parameter of the scroll API to slow down the data retrieval speed or perform the migration during off-peak hours, reducing impact on the performance of the source cluster.

Constraints

  • During cluster migration, do not add, delete, or modify the index data of the source cluster. Otherwise, the data in the source cluster will be inconsistent with that in the destination cluster after the migration.
  • The source and destination clusters must use the same version.

Prerequisites

  • The source and destination Elasticsearch clusters are available.
  • The network between the clusters is connected.
    • If the source and destination clusters are in different VPCs, establish a VPC peering connection between them. For details, see VPC Peering Connection Overview.
    • To migrate an in-house built Elasticsearch cluster to Huawei Cloud, you can configure public network access for this cluster.
    • To migrate a third-party Elasticsearch cluster to Huawei Cloud, you need to establish a VPN or Direct Connect connection between the third party's internal data center and Huawei Cloud.
  • Ensure that _source has been enabled for indexes in the cluster.

    By default, _source is enabled. You can run the GET {index}/_search command to check whether it is enabled. If the returned index information contains _source, it is enabled.

Obtaining Information About the Source Elasticsearch Cluster

Before data migration, obtain necessary information about the source cluster for configuring a migration task.

Table 1 Required information about the source Elasticsearch cluster

Cluster Type

Required Information

How to Obtain

Huawei Cloud Elasticsearch cluster

  • Access address of the source cluster
  • Username and password for accessing the source cluster (only for security-mode clusters)
  • Index structure
  • For details about how to obtain the cluster name and address, see 3.
  • Contact the service administrator to obtain the username and password.
  • For details about how to query the index structure, see 6.

In-house built Elasticsearch cluster

  • Public network address of the source cluster
  • Username and password for accessing the source cluster (only for security-mode clusters)
  • Index structure

Contact the service administrator to obtain the information.

Third-party Elasticsearch cluster

  • Access address of the source cluster
  • Username and password for accessing the source cluster (only for security-mode clusters)
  • Index structure

Contact the service administrator to obtain the information.

The method of obtaining the cluster information varies depending on the source cluster. This section describes how to obtain information about a Huawei Cloud Elasticsearch cluster.

  1. Log in to the CSS management console.
  2. In the navigation pane on the left, choose Clusters > Elasticsearch.
  3. In the Elasticsearch cluster list, obtain the cluster name and address.
    Figure 2 Obtaining cluster information
  4. Click Access Kibana in the Operation column to log in to the Kibana console.
  5. Click Dev Tools in the navigation tree on the left.
  6. Run the following command to query the index structure of the source cluster:
    GET {index_name}

    index_name indicates the name of the index to be migrated.

Configuring the Reindex Remote Access Whitelist

In the destination Elasticsearch cluster, configure the reindex whitelist.
  1. Log in to the CSS management console.
  2. In the navigation pane on the left, choose Clusters > Elasticsearch.
  3. In the Elasticsearch cluster list, click the destination cluster. The cluster information page is displayed.
  4. In the navigation pane on the left, click Parameter Configurations. Then click Edit, and expand Reindexing.

    If the source Elasticsearch cluster uses HTTPS, expand Customize, and add a custom parameter to ignore SSL authentication.

    • Parameter: reindex.ssl.verification_mode
    • Value: none
  5. Click Submit. In the displayed confirmation dialog box, confirm the parameter settings, select I understand that the modification will take effect after the cluster is restarted, then click OK.
  6. Return to the Elasticsearch cluster list, and locate the destination cluster. Choose More > Restart in the Operation column to restart the cluster and make the change take effect.

Migrating Indexes Using the Reindex API

  1. In the destination cluster, create an index structure identical to that in the source cluster.
    1. Log in to the CSS management console.
    2. In the navigation pane on the left, choose Clusters > Elasticsearch.
    3. In the Elasticsearch cluster list, locate the destination cluster, and click Access Kibana in the Operation column to log in to the Kibana console.
    4. Click Dev Tools in the navigation tree on the left.
    5. Run the following command to create an index structure that is identical to that in the source cluster:
      PUT {index_name}
      {
      Index structure of the source cluster
      }

      index_name indicates the index name after the migration. For the index structure of the source cluster, see Obtaining Information About the Source Elasticsearch Cluster.

  2. Run the following command to migrate data using the reindex API:
    • Full migration: Migrate the full amount of index data in the source cluster to the destination cluster.

      On the Kibana console of the destination cluster, run the following command:

      POST _reindex?wait_for_completion=false
      {
        "source": {
           "remote": {
            "host": "http://xx.xx.xx.xx:9200",    //Address of the source cluster. If the source cluster uses HTTPS, use https://xx.xx.xx.xx:9200.
            "username": "xxx",    //Username for accessing the source cluster. It is needed for a security-mode cluster only.
            "password": "******"    //Password for accessing the source cluster. It is needed for a security-mode cluster only.
          },
          "index": "index_name",    //Index name in the source cluster
          "size": 3000
        },
        "dest": {
          "index": "index_name"    //Index name in the destination cluster
        }
      }
    • Incremental migration: Migrate new/changed index data from the source cluster to the destination cluster based on timestamps. This method can also be used to migrate an oversized index one chunk at a time.

      On the Kibana console of the destination cluster, run the following command:

      POST _reindex?wait_for_completion=false
      {
        "source": {
          "remote": {
            "host": "http://xx.xx.xx.xx:9200",    //Address of the source cluster. If the source cluster uses HTTPS, use https://xx.xx.xx.xx:9200.
            "username": "xxx",    //Username for accessing the source cluster. It is needed for a security-mode cluster only.
            "password": "******"    //Password for accessing the source cluster. It is needed for a security-mode cluster only.
          },
          "query": {
            "range" : {
              "timestamps" : {    //The time field
                "gte" : "xxx",    //Start time of the incremental data.
                "lte" : "xxx"    //End time of the incremental data.
              }
            }
          },
          "index": "index_name",    //Index name in the source cluster
          "size": 3000
        },
        "dest": {
          "index": "index_name"    //Index name in the destination cluster
        }
      }
    • Index reorganization in the same cluster: Use the reindex API to restructure indexes during migration.

      On the Kibana console of the destination cluster, run the following command:

      POST _reindex?wait_for_completion=false
      {
        "source": {
          "index": "index_name",    //Index name in the source cluster
          "size": 3000
        },
        "dest": {
          "index": "index_name"    //Index name in the destination cluster
        }
      }

FAQ: What Do I Do If It Is Slow to Migrate an Oversized Index?

It may take a long time to migrate an oversize index. To speed up the migration, use the following methods: