Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
On this page
Help Center/ Cloud Search Service/ Troubleshooting/ Unavailable Clusters/ A Cluster is Unavailable Due to Improper Shard Allocation

A Cluster is Unavailable Due to Improper Shard Allocation

Updated on 2024-11-20 GMT+08:00

Symptom

The cluster status is Unavailable.

On the Dev Tools page of Kibana, run the GET _cluster/health command to check the cluster health status. In the output, the value of status is red and the value of unassigned_shards is not 0. Alternatively, on the Cerebro page, click overview to view the index shard allocation on each data node. If the cluster status is red and the value of unassigned shards is not 0, there are index shards that cannot be allocated in the cluster.
Figure 1 Cluster health status
Figure 2 Cerebro page

Possible Causes

Some index shards in the cluster are not properly allocated.

Procedure

Step 1: Determine the reason why the cluster is unavailable.
  1. Use Kibana to access the faulty cluster. On the Dev Tools page of Kibana, run the GET /_recovery?active_only=true command to check whether the cluster is restoring the backup.
    • If {"index_name":{"shards":[{"id":25,"type":"... is returned, there are indexes being restored using backup. Wait until the backup restoration is complete. If the cluster status is still Unavailable, go to the next step.
    • If { } is returned, the cluster is not performing backup restoration. Go to the next step.
  2. Run the GET _cluster/allocation/explain?pretty command to check why some index shards are not allocated based on the returned information.
    Table 1 Parameter description

    Parameter

    Description

    index

    Index name

    shard

    Shard ID

    current_state

    Shard status

    allocate_explanation

    Shard allocation explanation

    explanation

    Explanation

    Table 2 Faults description

    Symptom

    Causes

    Procedure

    explanation: no allocations are allowed due to cluster setting [cluster.routing.allocation.enable=none]

    The cluster allocation policy forbids the allocation of all shards.

    For details, see cluster.routing.allocation.enable in Incorrect Shard Allocation Policy Configuration.

    explanation: too many shards [3] allocated to this node for index [write08]index setting [index.routing.allocation.total_shards_per_node=3]

    The number of shards that can be allocated to each data node from a single index in the cluster is too small, which does not meet the index shard allocation requirements.

    For details, see index.routing.allocation.total_shards_per_node in Incorrect Shard Allocation Policy Configuration.

    explanation: too many shards [31] allocated to this node, cluster setting [cluster.routing.allocation.total_shards_per_node=30]

    The number of shards that can be allocated to each data node in the cluster is too small.

    For details, see cluster.routing.allocation.total_shards_per_node in Incorrect Shard Allocation Policy Configuration.

    explanation: node does not match index setting [index.routing. allocation. include] filters [box_type:"hot"]

    Index shards can only be allocated to data nodes labeled with hot. If a cluster has no nodes labeled with hot, shards cannot be allocated.

    For details, see index.routing.allocation.include in Incorrect Shard Allocation Policy Configuration.

    explanation: node does not match index setting [index.routing. allocation. require] filters [box_type:"xl"]

    Index shards can only be allocated to data nodes with specified labels. If a cluster has no such nodes, shards cannot be allocated.

    For details, see index.routing.allocation.require in Incorrect Shard Allocation Policy Configuration.

    explanation: [failed to obtain in-memory shard lock]

    Generally, this problem occurs when a node is removed from a cluster for a short time and then added back to the cluster. In addition, a thread is performing a long-term data writing to a shard, such as bulk or scroll. When the node is added to the cluster again, the master node cannot allocate the shard because the shard lock is not released.

    For details, see shard lock error.

    explanation: node does not match index setting [index.routing.allocation.include] filters [_tier_preference:"data_hot OR data_warm OR data_cold"]

    The configuration of an index does not match the cluster version.

    For details, see Inconsistent index parameter version.

    explanation: cannot allocate because all found copies of the shard are either stale or corrupt

    The data on index shards is damaged.

    For details, see Damaged primary shard data.

    explanation: the node is above the high watermark cluster setting [cluster.routing. allocation. disk.watermark.high=90%], using more disk space than the maximum allowed [90.0%], actual free: [6.976380997419324%]

    The node disk usage reaches the upper limit.

    For details, see Excessive disk usage.

Step 2: Rectify the fault.

  • Incorrect shard allocation policy
    • cluster.routing.allocation.enable
      1. If the value of explanation in the output is as follows, the current allocation policy of the cluster forbids the allocation of all shards.
        Figure 3 Incorrect configuration of allocation.enable
      2. On the Dev Tools page of Kibana, run the following command to set enable to all to allow all shards to be allocated:
        PUT _cluster/settings
        {
          "persistent": {
            "cluster": {
              "routing": {
                "allocation.enable": "all"
              }
            }
          }
        }
        NOTE:

        The index-level configuration overwrites the cluster-level configuration. The parameters are described as follows:

        • all: Default value. All types of shards can be allocated.
        • primaries: Only the primary shards can be allocated.
        • new_primaries: Only the primary shards of the newly created index can be allocated.
        • none: No shards can be allocated.
      3. Run the POST _cluster/reroute?retry_failed=true command to manually allocate shards. Wait until all index shards are allocated and the cluster status changes to Available.
    • index.routing.allocation.total_shards_per_node
      1. If the value of explanation in the output is as follows, the value of index.routing.allocation.total_shards_per_node is too small and does not meet the index shard allocation requirements.
        Figure 4 Incorrect configuration of index total_shards_per_node
      2. On the Dev Tools page of Kibana, run the following command to change the number of index shards that can be allocated to each node:
        PUT index_name/_settings
        {
          "index": {
            "routing": {
              "allocation.total_shards_per_node": 3
            }
          }
        }
        NOTE:

        Value of index.routing.allocation.total_shards_per_node = Number of index_name index shards/(Number of data nodes - 1)

        Set this parameter to a relative large value. Assume that a cluster has 10 nodes, including five data nodes, two client nodes, and three master nodes. The number of shards of an index is 30. If total_shards_per_node is set to 4, the total number of shards that can be allocated is: 4 x 5 = 20. Not all shards cannot be allocated. To allocate all shards in this index, at least six shards should be allocated to each data node (30 shards in total). In case a data node is faulty, at least eight shards should be allocated to each node.

      3. Run the POST _cluster/reroute?retry_failed=true command to manually allocate shards. Wait until the index shards are allocated and the cluster status changes to Available.
    • cluster.routing.allocation.total_shards_per_node
      1. If the value of explanation in the output is as follows, the number of shards that can be allocated to each data node in the cluster is too small.
        Figure 5 Incorrect configuration of cluster total_shards_per_node
      2. The value of cluster.routing.allocation.total_shards_per_node indicates the maximum number of shards that can be allocated to each data node in a cluster. The default value of this parameter is 1000. On the Dev Tools page of Kibana, run the following command to specify the cluster.routing.allocation.total_shards_per_node parameter:
        PUT _cluster/settings
        {
          "persistent": {
            "cluster": {
              "routing": {
                "allocation.total_shards_per_node": 1000
              }
            }
          }
        }
      3. In most cases, the problem occurs because index.routing.allocation.total_shards_per_node is mistakenly set to cluster.routing.allocation.total_shards_per_node. Run the following command to specify the index.routing.allocation.total_shards_per_node parameter:
        PUT index_name/_settings
        {
          "index": {
            "routing": {
              "allocation.total_shards_per_node": 30
            }
          }
        }
        NOTE:
        Both of the following parameters are used to limit the maximum number of shards that can be allocated to a single data node:
        • cluster.routing.allocation.total_shards_per_node is used to limit shard allocation at the cluster level.
        • index.routing.allocation.total_shards_per_node is used to limit shard allocation at the index level.
      4. Run the POST _cluster/reroute?retry_failed=true command to manually allocate shards. Wait until the index shards are allocated and the cluster status changes to Available.
    • index.routing.allocation.include
      1. If the value of explanation in the output is as follows, index shards can only be allocated to data nodes with the hot label. If no data nodes in the cluster are labeled with hot, shards cannot be allocated.
        Figure 6 Incorrect configuration of include
      2. On the Dev Tools page of Kibana, run the following command to cancel the configuration:
        PUT index_name/_settings
        {
          "index.routing.allocation.include.box_type": null
        }
      3. Run the POST _cluster/reroute?retry_failed=true command to manually allocate shards. Wait until the index shards are allocated and the cluster status changes to Available.
    • index.routing.allocation.require
      1. If the value of explanation in the output is as follows, shards can only be allocated to data nodes with specified labels. If no nodes in the cluster have such labels, the shards cannot be allocated.
        Figure 7 Incorrect configuration of require
      2. On the Dev Tools page of Kibana, run the following command to cancel the configuration:
        PUT index_name/_settings
        {
          "index.routing.allocation.require.box_type": null
        }
      3. Run the POST _cluster/reroute?retry_failed=true command to manually allocate shards. Wait until the index shards are allocated and the cluster status changes to Available.
  • Shard lock error
    1. In the output, explanation contains [failed to obtain in-memory shard lock]. This problem usually occurs when a node is removed from a cluster for a short time and then added back to the cluster, and a thread is performing a long-term data writing to a shard, such as bulk or scroll. When the node is added to the cluster again, the master node cannot allocate the shard because the shard lock is not released.
    2. This problem does not cause fragment data loss. You only need to allocate the shard again. On the Dev Tools page of Kibana, run the POST /_cluster/reroute?retry_failed=true command to manually allocate the unallocated shard. Wait until the index shards are allocated and the cluster status changes to Available.
  • Inconsistent index setting and node version
    1. The value of index and explanation in the output are as follows, indicating that the parameter configuration of an index does not match the node version.
      Figure 8 Inconsistent index configuration
    2. Run the GET index_name/_settings command to check the index configuration. In the output, check whether the index features match the node version.
      Figure 9 Index configuration

      For example, assume that a cluster version is 7.9.3. The index feature index.routing.allocation.include._tier_preference is supported by clusters of the version later than 7.10. If you use this feature in a cluster of the version earlier than 7.10, index shards cannot be allocated. As a result, the cluster is unavailable.

    3. Determine whether the inapplicable feature is mandatory for the cluster.
      • If yes, create a cluster of the required version and restore the data of the old cluster to the new cluster using the backup.
      • If no, go to the next step.
    4. Run the following command to remove the inapplicable index feature:
      PUT /index_name/_settings
      {
        "index.routing.allocation.include._tier_preference": null
      }
    5. Run the POST /_cluster/reroute?retry_failed=true command to manually allocate the unallocated shards. Wait until the index shards are allocated and the cluster status changes to Available.
  • Damaged primary shard data
    1. The values of index, shard, allocate_explanation, and store_exception in the output are as follows, indicating that the data of a shard in an index is damaged.
      Figure 10 Damaged primary shard data
    2. When the index data is damaged or the primary backup of a shard is lost, run the following command to define an empty shard and specify the node to be allocated:
      POST /_cluster/reroute
      {
          "commands" : [
                  {
                  "allocate_empty_primary" : {
                      "index" : "index_name", 
                      "shard" : 2,
                      "node" : "node_name",
                      "accept_data_loss":true
                  }
              }
          ]
      }
      NOTICE:

      Data in the corresponding shard will be completely cleared. Exercise caution when performing this operation.

    3. After index shards are reallocated, the cluster status becomes Available.
  • Excessive disk usage
    1. The output is as follows. The value of allocate_explanation indicates that shards of an index cannot be allocated to any data node, and the value of explanation indicates that node disk usage reaches the upper limit.
      Figure 11 Query result
      NOTE:
      • If the disk usage of a node exceeds 85%, new shards will not be allocated to this node.
      • If the disk usage of a node exceeds 90%, the cluster attempts to migrate the shards of this node to other data nodes with low disk usage. If data cannot be migrated, the system forcibly sets the read_only_allow_delete attribute for each index in the cluster. In this case, data cannot be written to indexes, and the indexes can only be read or deleted.
      • A node may be disconnected due to high disk usage. After the node automatically recovers, once the cluster is overloaded, the cluster may fail to respond when you call the Elasticsearch API to query the cluster status. If the cluster status cannot be updated in a timely manner, the cluster becomes Unavailable.
    2. Increase the available disk capacity of the cluster.
      • On the Dev Tools page of Kibana, run the DELETE index_name command to clear invalid data in the cluster to release disk space.
      • Temporarily reduce the number of index copies. After the disk or node capacity is expanded, change the number of index copies back to the original value.
        1. On the Dev Tools page of Kibana, run the following command to temporarily reduce the number of index copies:
          PUT index_name/_settings
          {
            "number_of_replicas": 1
          }
          The output is as follows.
          Figure 12 Index status of read-only-allow-delete

          The disk usage exceeds the maximum value allowed by the disk space. The system forcibly sets the read_only_allow_delete attribute for all indexes in the cluster. Run the following command to set the attribute value to null, and then run the command in 2.a to reduce the number of index copies:

          PUT /_settings
          {
            "index.blocks.read_only_allow_delete": null
          }
        2. Increase the number of nodes or node storage capacity of the cluster by referring to Scaling Out a Cluster.
        3. After the scale-out is complete, run the command in step 2.a to change the number of index copies back. After all index shards are allocated, the cluster status changes to Available.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback