Help Center/ Cloud Search Service/ User Guide/ Using Elasticsearch for Data Search/ Upgrading the Version of an Elasticsearch Cluster
Updated on 2024-10-12 GMT+08:00

Upgrading the Version of an Elasticsearch Cluster

Elasticsearch clusters support same-version upgrade, cross-version upgrade, and cross-engine upgrade.

Scenario

Upgrade scenarios

  • Same-version upgrade means to upgrade the kernel patches to fix problems or optimize performance.
  • Cross-version upgrade means to upgrade the cluster version to enhance functionality or incorporate versions.
  • Cross-engine upgrade means to upgrade an Elasticsearch cluster to an OpenSearch cluster.

Principle

Nodes in the cluster are upgraded one by one so that services are not interrupted. The upgrade process is as follows: bring a node offline, migrate its data to another node, create a new node of the target version, and mount the NIC ports of the offline node to the new node to retain the node IP address. After a new node is added to the cluster, other nodes will be updated in the same way in sequence. If there is a large amount of data in a cluster, the upgrade duration depends on the data migration duration.

Process

  1. Perform the pre-upgrade check: Pre-Upgrade Check.

    The pre-upgrade check is mostly automated. A few of the items need to be checked manually.

  2. Create a snapshot to back up the full index data: Manually Creating a Snapshot.

    During upgrade configuration, you can choose to check whether the full index data has been backed up using snapshots. This helps to prevent data loss in case of an upgrade failure.

  3. Create an upgrade task and start the upgrade: Creating an Upgrade Task.

Version upgrade paths

The supported target version varies depending on the current version. For details, see Table 1.
Table 1 Version upgrade paths

Current Version

Target Version

Elasticsearch: 6.2.3

Elasticsearch: 6.5.4 or 6.8.23

Elasticsearch: 6.5.4

Elasticsearch: 6.8.23

Elasticsearch: 6.8.23

Elasticsearch: 7.6.2 or 7.10.2

Elasticsearch: 7.1.1

Elasticsearch: 7.6.2 or 7.10.2

Elasticsearch: 7.6.2

Elasticsearch: 7.10.2

Elasticsearch: 7.9.3

Elasticsearch: 7.10.2

Elasticsearch: 7.10.2

OpenSearch: 1.3.6

Note:

  • Elasticsearch 7.6.2 and 7.10.2 are mainstream cluster versions. You are advised to upgrade your clusters to these two versions. The supported target versions are displayed in the drop-down list of Target Image.
  • Elasticsearch clusters of version 5.X cannot be upgraded across versions. Elasticsearch clusters of versions 6.2.3 and 6.5.4 can be upgraded to 6.8.23 and then to 7.X.X.
  • Currently, only Elasticsearch clusters of version 7.10.2 can be upgraded to OpenSearch clusters of version 1.3.6 across engines.

Constraints

  • A maximum of 20 clusters can be upgraded at the same time. You are advised to perform the upgrade during off-peak hours.
  • Clusters that have ongoing tasks cannot be upgraded.
  • Once started, an upgrade task cannot be stopped until it succeeds or fails.
  • During the upgrade, nodes are replaced one by one. Requests sent to a node that is being replaced may fail. In this case, you are advised to access the cluster through the VPC Endpoint service or a dedicated load balancer.
  • During the upgrade, the Kibana and Cerebro components will be rebuilt and cannot be accessed. Different Kibana versions are incompatible with each other. During the upgrade, you may fail to access Kibana due to version incompatibility. A cluster can be accessed after it is successfully upgraded.

Pre-Upgrade Check

To ensure a successful upgrade, you must check the items listed in the following table before performing an upgrade.

Table 2 Pre-upgrade checklist

Check Item

Check Method

Description

Normal Status

Cluster status

System check

After an upgrade task is started, the system automatically checks the cluster status. Clusters whose status is green or yellow can provide services properly and have no unallocated primary shards.

The cluster status is Available.

Node quantity

System check

After an upgrade task is started, the system automatically checks the number of nodes. To ensure service continuity, the total number of data nodes and cold data nodes in a cluster must be greater than or equal to 3.

The total number of data nodes and cold data nodes in a cluster must be greater than or equal to 3.

Disk capacity

System check

After an upgrade task is started, the system automatically checks the disk capacity. During the upgrade, nodes are brought offline one by one and then new nodes are created. Ensure that the disk capacity of all the remaining nodes can process all data of the node that has been brought offline.

After a node is brought offline, the remaining nodes can contain all data of the cluster.

Data backup

System check

Check whether the maximum number of primary and standby shards of indexes in a cluster can be allocated to the remaining data nodes and cold data nodes. Prevent backup allocation failures after a node is brought offline during the upgrade.

Maximum number of primary and standby index shards plus 1 must be less than or equal to the total number of data nodes and cold data nodes before the upgrade.

Data backup

System check

Before the upgrade, back up data to prevent data loss caused by upgrade faults. When submitting an upgrade task, you can determine whether to enable the system to check for the backup of all indexes.

Check whether data has been backed up.

Resources

System check

After an upgrade task is started, the system automatically checks resources. Resources will be created during the upgrade. Ensure that resources are available.

Resources are available and sufficient.

Custom plugins

System and manual check

Perform this check only when custom plugins are installed in the source cluster. If a cluster has a custom plugin, upload all plugin packages of the target version on the plugin management page before the upgrade. During the upgrade, install the custom plugin in the new nodes. Otherwise, the custom plugins will be lost after the cluster is successfully upgraded. After an upgrade task is started, the system automatically checks whether the custom plugin package has been uploaded, but you need to check whether the uploaded plugin package is correct.

NOTE:

If the uploaded plugin package is incorrect or incompatible, the plugin package cannot be automatically installed during the upgrade. As a result, the upgrade task fails. To restore a cluster, you can terminate the upgrade task and restore the node that fails to be upgraded by Replacing a Specified Node.

After the upgrade is complete, the status of the custom plugin is reset to Uploaded.

The plugin package of the cluster to be upgraded has been uploaded to the plugin list.

Custom configurations

System check

During the upgrade, the system automatically synchronizes the content of the cluster configuration file elasticsearch.yml.

Clusters' custom configurations are not lost after the upgrade.

Non-standard operations

Manual check

Check whether non-standard operations have been performed in the cluster. Non-standard operations refer to manual operations that are not recorded. These operations cannot be automatically transferred during the upgrade, for example, modification of the Kibana.yml configuration file, system configuration, and route return.

Some non-standard operations are compatible. For example, the modification of a security plugin can be retained through metadata, and the modification of system configuration can be retained using images. Some non-standard operations, such as the modification of the kibana.yml file, cannot be retained, and you must back up the file in advance.

Compatibility check

System and manual check

After a cross-version upgrade task is started, the system automatically checks whether the source and target versions have incompatible configurations. If a custom plugin is installed for a cluster, the version compatibility of the custom plugin needs to be manually checked.

Configurations before and after the cross-version upgrade are compatible.

Check Cluster Loads

System and manual check

If the cluster is heavily loaded, there is a high probability that the upgrade will get stuck or fail. You are advised to check the cluster load before the upgrade and perform the upgrade only during off-peak hours.

You can also choose to check the cluster load while configuring upgrade information.

  • nodes.thread_pool.search.queue < 1000: Check whether the maximum number of search queues is less than 1000.
  • nodes.thread_pool.write.queue < 200: Check whether the maximum number of write queues is less than 200.
  • nodes.process.cpu.percent < 90: Check whether the maximum CPU usage is less than 90%.
  • nodes.os.cpu.load_average/Number of CPU cores < 80%: Check whether the ratio of the maximum load to the number of CPU cores is less than 80%.

Creating an Upgrade Task

  1. Log in to the CSS management console.
  2. In the navigation pane on the left, choose Clusters. On the cluster list page that is displayed, click the name of a cluster.
  3. On the displayed basic cluster information page, click Version Upgrade.
  4. On the displayed page, set upgrade parameters.
    Table 3 Upgrade parameters

    Parameter

    Description

    Upgrade Type

    • Same-version upgrade: Upgrade the kernel patch of the cluster. The cluster version number remains unchanged.
    • Cross-version upgrade: Upgrade the cluster version.
    • Cross-engine upgrade: Upgrade an Elasticsearch cluster to an OpenSearch cluster. Currently, only the Elasticsearch cluster of version 7.10.2 can be upgraded to the OpenSearch cluster of version 1.3.6.

    Target Image

    Image of the target version. When you select an image, the image name and target version details are displayed.

    The supported target versions are displayed in the drop-down list of Target Image. If the target image cannot be selected, the possible causes are as follows:

    • The current cluster is of the latest version.
    • The current cluster is created before 2023 and has vector indexes.
    • The new version images have not been added at the current region.

    Agency

    When a node is deleted, NICs are released. This means you need to have VPC permissions. Select an IAM agency to grant the current account the permission to access and use VPC.

    • This parameter is available only when the new IAM plane is connected.
    • If you are configuring an agency for the first time, click Automatically Create IAM Agency to create css-upgrade-agency.
    • If there is an IAM agency automatically created earlier, you can click One-click authorization to delete the VPC Administrator role or the VPC FullAccess system policy, and add the following custom policies instead to implement more refined permissions control.
      "vpc:subnets:get",
      "vpc:ports:*"
    • To use Automatically Create IAM Agency and One-click authorization, the following minimum permissions are needed:
      "iam:agencies:listAgencies",
      "iam:roles:listRoles",
      "iam:agencies:getAgency",
      "iam:agencies:createAgency",
      "iam:permissions:listRolesForAgency",
      "iam:permissions:grantRoleToAgency",
      "iam:permissions:listRolesForAgencyOnProject",
      "iam:permissions:revokeRoleFromAgency",
      "iam:roles:createRole"
    • To use an IAM agency, the following minimum permissions are needed:
      "iam:agencies:listAgencies",
      "iam:agencies:getAgency",
      "iam:permissions:listRolesForAgencyOnProject",
      "iam:permissions:listRolesForAgency"
  5. After setting the parameters, click Submit. Determine whether to enable Check full index snapshot and Perform cluster load detection and click OK.

    If a cluster is overloaded, the upgrade task may suspend or fail. Enabling Cluster load detection can effectively avoid failures.

    If any of the following situations occurs during the detection, wait or reduce the load. If you urgently need to upgrade the version and you have understood the upgrade failure risks, you can disable the Cluster load detection function. The cluster load detection items are as follows:

    • nodes.thread_pool.search.queue < 1000: check whether the maximum number of search queues is less than 1000.
    • nodes.thread_pool.write.queue < 200: Check whether the maximum number of write queues is less than 200.
    • nodes.process.cpu.percent < 90: Check whether the maximum CPU usage is less than 90%.
    • nodes.os.cpu.load_average/Number of CPU cores < 80%: Check whether the ratio of the maximum load to the number of CPU cores is less than 80%.
  6. View the upgrade task in the task list. If the task status is Running, you can expand the task list and click View Progress to view the upgrade progress.

    If the task status is Failed, you can retry or terminate the task.

    • Retry the task: Click Retry in the Operation column.
    • Terminate the task: Click Terminate in the Operation column.
      • Same version upgrade: If the upgrade task status is Failed, you can terminate the upgrade task.
      • Cross version upgrade: You can stop an upgrade task only when the task status is Failed and no node has been upgraded.

      After an upgrade task is terminated, the Task Status of the cluster is rolled back to the status before the upgrade, and other tasks in the cluster are not affected.