Migrating Data Between Huawei Cloud Elasticsearch Clusters Using Backup and Restoration

Data can be migrated between CSS Elasticsearch clusters by backing up and restoring cluster snapshots.

Scenarios

This option can be used only when both the source and destination clusters are CSS clusters that rely on Object Storage Service (OBS). Typical application scenarios include:

Cross-region or cross-account migration: Migrate the data of an Elasticsearch cluster in another region or under another account to the current cluster.
Cross-version migration: Migrate data from a self-built Elasticsearch cluster of an earlier version to a cluster of a later version.
Cluster merge: Merge the index data of two Elasticsearch clusters.

Solution Architecture

Figure 1 Migration procedure

Figure 1 shows the process of migrating data between Huawei Cloud Elasticsearch clusters using backup and restoration.

Create a snapshot for the source Elasticsearch cluster and store the snapshot in an OBS bucket.
Restore the snapshot to the destination Elasticsearch cluster.

Advantages

Easy operation and management: The cluster snapshot function on the CSS console allows for simple, easy-to-manage, and automatic data backup and restoration.
Applicable to large-scale data migration: Snapshot backup is suitable for scenarios involving large amounts of data, especially when the data volume reaches GB, TB, or even PB levels.
Cross-region and cross-account migration: With the cross-region replication function of OBS, data can be migrated across different regions and accounts.
Controllable restoration process: During data restoration, you can restore specific indexes or all indexes and specify the cluster status to be restored.
Controllable migration duration: The data migration rate can be configured based on the migration duration evaluation formula. Ideally, the data migration rate matches the file replication rate.

Impact on Performance

This migration method works by copying data directly from the storage layer. It does not rely on any external Elasticsearch APIs. Hence it significantly reduces any impact on the performance of the source cluster. For latency-insensitive workloads, the impact is negligible.

Constraints

The version of the destination cluster must not be earlier than that of the source cluster. For details, see Snapshot version compatibility.
The number of nodes in the destination cluster must be greater than half of that in the source cluster, and cannot be less than the number of shard replicas in the source cluster.
The CPU, memory, and disk capacities of the destination cluster must not be lower than those of the source cluster.

Migration Duration

The number of nodes or index shards in the source and destination clusters determines how long the data migration will take. Data migration consists of two phases: data backup and restoration. The backup duration is determined by the source cluster and the restoration duration is determined by the destination cluster. The formula for calculating the total migration duration is as follows:

If the number of index shards is greater than the number of nodes:
Total duration (in seconds) = Size of migrated data (in GB)/40 MB (0.04 GB)/(Number of nodes in the source cluster + Number of nodes in the destination cluster) x Number of indexes
If the number of index shards is smaller than the number of nodes:
Total duration (in seconds) = Size of migrated data (in GB)/40 MB (0.04 GB)/(Number of index shards in the source cluster + Number of index shards in the destination cluster) x Number of indexes

The migration duration estimated using the formula is the minimal duration possible (if each node transmits data at the fastest speed, 40 MB/s). The actual duration also depends on factors such as the network and resources condition.

Prerequisites

The destination cluster (Es-2) and source cluster (Es-1) are available. You are advised to migrate a cluster during off-peak hours.
Ensure that the destination cluster (Es-2) and source cluster (Es-1) are in the same region.
If the cluster is deployed across regions or accounts, copy the OBS bucket that stores snapshots for the source cluster to that for the destination cluster. For details, see Cross-Region Replication. Then, restore the snapshots in the destination cluster.
Ensure that an OBS bucket is available for storing snapshots. The OBS bucket must be in the same region as the Elasticsearch clusters, and the storage class must be Standard.

Procedure

Log in to the CSS management consoleCSS management console.
In the navigation pane on the left, choose Clusters > Elasticsearch.
In the cluster list, click Es-1, the name of the source cluster. The cluster information page is displayed.

Select the Cluster Snapshots tab. Click Enable Snapshot. In the displayed dialog box, configure basic snapshot settings.

**Table 1** Enabling snapshots
Parameter	Description
OBS Bucket	From the drop-down list, select an OBS bucket for storing snapshots.
Backup Path	Snapshot storage path in the OBS bucket. You can retain the default value.
Maximum Backup Rate (per Second)	The parameter sets the maximum backup rate per node. When it is exceeded, flow control is triggered to prevent excessive resource usage and ensure system stability. The actual backup rate may not reach the configured value, as it depends on factors such as OBS performance and disk I/O. You can use the default value 40 MB.
Maximum Recovery Rate (per Second)	The parameter sets the maximum recovery rate per node. When it is exceeded, flow control is triggered to prevent excessive resource usage and ensure system stability. The actual recovery rate may not reach the configured value, as it depends on factors such as OBS performance and disk I/O. The recommended value is 40 MB. For Elasticsearch clusters later than 7.6.2, the recovery rate is also limited by the indices.recovery.max_bytes_per_sec parameter. If Maximum Recovery Rate (per Second) is less than indices.recovery.max_bytes_per_sec, the former takes effect. If Maximum Recovery Rate (per Second) is greater than indices.recovery.max_bytes_per_sec, the latter takes effect. NOTE: To check the value of indices.recovery.max_bytes_per_sec, run the following command: GET _cluster/settings To modify indices.recovery.max_bytes_per_sec, run the following command: PUT _cluster/settings { "transient": { "indices.recovery.max_bytes_per_sec": "100mb" } }
IAM Agency	Select an IAM agency to grant the current account the permission to access and use OBS. To store snapshots to an OBS bucket, you must have the required OBS access permissions. You are advised to use the css_obs_agency agency created automatically. If an agency has been created automatically, you can click One-click Authorization to grant minimal permissions. WARNING: The agency name can contain only letters (case-sensitive), digits, underscores (_), and hyphens (-). Otherwise, the backup will fail.
Automatic Snapshot Creation	For a data migration task, you are advised not to enable automatic snapshot creation. This is to avoid occupying storage resources.

Click OK to enable cluster snapshots.

Under Cluster Snapshot Tasks, click Manually Create Snapshot. In the displayed dialog box, configure the snapshot policy.

**Table 2** Parameters for manually creating a snapshot
Parameter	Description
Snapshot Name	Set the snapshot name. You can retain the default value.
Index	Specify the name of the index to be backed up. You can back up a specified index. To specify multiple indexes, use commas (,) to separate them, for example, index1,index2,index3. You can use an asterisk () to match multiple indexes. For example, index indicates that all indexes with the prefix index will be backed up. If you do not specify this parameter, all indexes in the cluster are backed up by default. The index name cannot contain spaces, uppercase letters, or special characters "\<\|>/?
Snapshot Description	Add a snapshot description.

Click OK to start creating a snapshot for the source cluster.
In the cluster snapshot task list, if Snapshot Status changes to Available, the snapshot has been created.
Check whether data is successfully backed up.
In the cluster snapshot task list, click the snapshot name. The View Details dialog box is displayed. Check the shards and indexes that have been backed up to see if the backup is successful.

Figure 2 View Details

In the cluster snapshot task list, select a snapshot, and click Restore in the Operation column. In the displayed dialog box, configure necessary settings.

Retaining the original index name when restoring an index
Specify Index for the index to be replaced.
Renaming an index when restoring it
To rename an index upon restoring it, specify Index, Rename Pattern, and Rename Replacement.

**Table 3** Snapshot restoration parameters
Parameter	Description
Index	Specify the name of the index you want to restore. The value is a string of 0 to 1024 characters that cannot contain uppercase letters, spaces, or the following special characters: "\<\|>/?. When restoring an index whose name is prefixed with .kibana, the index name must be specified. The .opendistro_security index cannot be restored. You can use an asterisk () to match multiple indexes. For example, index indicates that all indexes with the prefix index will be restored. When an asterisk () is used for index matching, the .opendistro_security* index and any system indexes whose name is prefixed with .kibana are filtered out by default. You can restore indexes by specifying their names, for example, index1,index2,index3. By default, this parameter is left blank. That is, no index name is specified, and all indexes will be restored.
Rename Pattern	Index name matching rule. Enter a regular expression. Indexes that match the regular expression will be restored. Rename Pattern and Rename Replacement take effect only when they are both configured at the same time. The value is a string of 0 to 1024 characters that cannot contain uppercase letters, spaces, or the following special characters: "\<\|>/?, For example, index_(.+) indicates that all indexes whose name starts with index_ will be renamed upon restoration.
Rename Replacement	Index renaming rule. Upon restoration, matching indexes are renamed according to the defined rule. Rename Pattern and Rename Replacement take effect only when they are both configured at the same time. The value is a string of 0 to 1024 characters that cannot contain uppercase letters, spaces, or the following special characters: "\<\|>/?, For example, restored_index_$1 indicates that restored_ will prefix the name of all restored indexes.
Cluster	Select a destination cluster, for example, Es-2.
Overwrite same-name indexes in destination cluster	Whether to overwrite same-name indexes in the destination cluster. We recommend keeping it unselected.

Click OK to restore data to the destination cluster Es-2.
In the snapshot list, when Task Status changes to Restoration succeeded, the data migration is complete.
After the data migration is complete, check the data consistency between the destination Elasticsearch cluster Es-2 and source Elasticsearch cluster Es-1. For example, run the _cat/indices command in the source and destination clusters, separately, to check whether their indexes are consistent.
(Optional) After the migration, promptly delete the OBS bucket used for storing snapshots if these snapshots are no longer needed. This is to prevent ongoing storage costs.