Help Center/ Cloud Search Service/ Troubleshooting/ Functions/ How Can I Troubleshoot a Cluster With an Abnormally Heavy Load?
Updated on 2024-08-27 GMT+08:00

How Can I Troubleshoot a Cluster With an Abnormally Heavy Load?

Symptom

A cluster's tasks are rejected for a long time and a large number of tasks are suspended. The load value of the cluster increases suddenly.

Possible Causes

Possible causes are as follows:

  • Query threads are executed slowly because a large amount of data is obtained.
  • Threads are suspended caused by high read pressures.

Troubleshooting Procedure

Method 1: Using Cerebro

  1. Log in to the CSS management console.
  2. In the navigation pane, choose Clusters > Elasticsearch.
  3. Locate the cluster whose load increases sharply and click Access Cerebro in the Operation column.
  4. Check the CPU and heap metrics. If the values of these two metrics are too high, the cluster is overloaded. In this case, reduce the number of requests sent by the client and wait until the cluster load decreases.
  5. Check the number and size of shards. Each shard is recommended to be 20 GB to 40 GB and not exceed 50 GB. On a single node, up to five shards can use the same index.

Method 2: Using Kibana

  1. Log in to the CSS management console.
  2. In the navigation pane, choose Clusters > Elasticsearch.
  3. Locate the cluster whose load increases sharply and click Access Kibana in the Operation column. Click Dev Tools.
  4. Run the GET _cat/thread_pool? command to view which threads are stacked and locate the cause of cluster workload increasing.
  5. Run the GET /_nodes/hot_threads command to view which threads occupy a large number of CPU resources and take a long time to execute, and locate the cause of task stacking.