Updated on 2023-06-20 GMT+08:00

Migration Solution Overview

You can migrate data to a Elasticsearch cluster from another Elasticsearch cluster, a user-built Elasticsearch cluster, or a third-party Elasticsearch cluster. This section describes the solutions for data migration from different clusters.

Scenarios

The migration solution varies depending on the data source.

  • Migration from an Elasticsearch cluster

    You can use Logstash, CDM, OBS backup and restoration, ESM, or cross-cluster replication plug-ins to migrate data in an Elasticsearch cluster.

    • Logstash: an official data cleaning tool provided by Elasticsearch. It is a part of the Elk ecosystem and provides powerful functions. It can migrate data between different data sources and Elasticsearch, and clean and process data. For details, see Migrating Cluster Data Using Logstash.
    • CDM: a cloud migration tool provided by cloud to implement cluster migration between different cloud services. For details, see .
    • Backup and restoration: Elasticsearch provides backup and restoration capabilities. You can back up the data of a cluster to OBS, and restore the data to another cluster. For details, see Migrating Cluster Data Through Backup and Restoration.
  • Migration from Kafka/MQ
  • Migration from a Database

Solutions

CSS supports migration by backup and restoration, by using the Reindex API or Logstash+ESM, or by data source synchronization. For details, see Table 1.

Data source synchronization has fewer constraints and higher performance than the other three solutions. Data source synchronization allows cutover anytime after the synchronization completed, which is more convenient and flexible.

Table 1 Migration solutions

Solution

Description

Constraint

Performance

Backup and restoration

Prepare shared storage that supports the S3 protocol, for example, an OBS bucket. Create a snapshot to back up the data of the source Elasticsearch cluster, synchronize the snapshot to the target cluster, and restore data to the target cluster.

  • Target Elasticsearch version ≥ Source Elasticsearch version
  • Number of candidate master nodes of the target Elasticsearch cluster > Half of the number of candidate master nodes of the source Elasticsearch cluster
  • Incremental data synchronization is not supported. You need to stop update before backing up or restoring data.

The data migration rate is configurable. Ideally, the data migration rate is the same as the file copy rate.

Reindex API

Configure mutual trust between the source and target Elasticsearch clusters, and then migrate data using the Reindex API.

  • _source must be enabled for indexes.
  • Real-time synchronization of incremental data is not supported. You need to stop the update and then call the API.

Batch read and write are supported, but concurrent slicing synchronization is not supported.

Logstash+ESM

Apply for an ECS, deploy and configure Logstash on it, and then start data migration.

  • _source must be enabled for indexes.
  • Real-time synchronization of incremental data is not supported. You need to stop the update and then start Logstash.

Batch read and write are supported, and concurrent slicing synchronization is supported.

Data source synchronization

Inventory data is migrated using Logstash, and incremental data is automatically synchronized through traffic replication or data links.

None

The inventory migration rate is the same as that of Logstash. An existing tool is reused for incremental migration.