Help Center/ Cloud Search Service/ Troubleshooting/ Functions/ How Can I Troubleshoot a Cluster With an Abnormally Heavy Load?

Updated on 2024-11-20 GMT+08:00

View PDF

How Can I Troubleshoot a Cluster With an Abnormally Heavy Load?

A cluster's tasks have been rejected for a long time and a large number of tasks are suspended. The cluster's load value increases sharply.

Possible causes are as follows:

Method 1: Using Cerebro

Log in to the CSS management console.
In the navigation pane, choose Clusters > Elasticsearch.
Locate the cluster whose load increases sharply and click Access Cerebro in the Operation column.
Check the CPU and heap metrics. If the values of these two metrics are too high, the cluster is overloaded. In this case, reduce the number of requests sent by the client and wait until the cluster load decreases.
Check the number and size of shards. Each shard is recommended to be 20 GB to 40 GB and not exceed 50 GB. On a single node, up to five shards can use the same index.

Method 2: Using Kibana

Log in to the CSS management console.
In the navigation pane, choose Clusters > Elasticsearch.
Locate the cluster whose load increases sharply and click Access Kibana in the Operation column. Click Dev Tools.
Run the GET _cat/thread_pool? command to view which threads are having tasks piling up and locate the cause of increased cluster workload.
Run the GET /_nodes/hot_threads command to view which threads occupy a large number of CPU resources and take a long time to execute, and locate the cause of task delaying.