ALM-43209 Total Number of Elasticsearch Instance Shards Exceeds the Threshold
Alarm Description
The system checks the total number of Elasticsearch instance shards every 60 seconds and compares the number with the threshold. This alarm is generated when the system detects that the number exceeds the threshold for multiple consecutive times (three times by default).
The threshold can be changed by choosing O&M > Alarm > Thresholds > Name of the desired cluster > Elasticsearch > Shard > Number Of Shard.
If Trigger Count is set to 1, and the total number of Elasticsearch instance shards is less than or equal to the threshold, this alarm is cleared. If Trigger Count is greater than 1, and the total number of Elasticsearch instance shards is less than or equal to 90% of the threshold, this alarm is cleared.
Alarm Attributes
Alarm ID |
Alarm Severity |
Alarm Type |
Service Type |
Auto Cleared |
---|---|---|---|---|
43209 |
Major (default threshold: 400) Critical (default threshold: 500) |
Quality of service |
Elasticsearch |
Yes |
Alarm Parameters
Type |
Parameter |
Description |
---|---|---|
Location Information |
Source |
Specifies the cluster for which the alarm is generated. |
ServiceName |
Specifies the service for which the alarm is generated. |
|
RoleName |
Specifies the role for which the alarm is generated. |
|
HostName |
Specifies the host for which the alarm is generated. |
|
Additional Information |
Trigger Condition |
Specifies the threshold for triggering the alarm. |
Impact on the System
If the total number of Elasticsearch shards is too large, the index data read/write performance of Elasticsearch may be affected, and the shard restoration speed may be slow when the Elasticsearch process is restarted.
Possible Causes
The configuration of the Elasticsearch index shard number is inappropriate.
Handling Procedure
Check the total number of Elasticsearch instance shards.
- Check whether the Elasticsearch cluster is in the security mode.
Specifically, on FusionInsight Manager, choose Cluster > Name of the desired cluster > Services > Elasticsearch. On the displayed page, click Configurations. Search for ELASTICSEARCH_SECURITY_ENABLE, and check whether the parameter can be queried and its value is true.
- If the security mode is used, configure the permission for running the curl command.
- Log in to any node where Elasticsearch resides as user root.
- Run the curl -XGET --tlsv1.2 --negotiate -k -v -u : 'https://ip:httpport/_cat/allocation?v' command to query the total number of instance shards in the cluster.
- In this command, replace ip with the IP address of any node in the cluster.
- Replace httpport with the HTTP port number of the Elasticsearch instance, which is specified by SERVER_PORT. To obtain the parameter value, on FusionInsight Manager, choose Cluster > Services > Elasticsearch. On the displayed page, choose Configurations > All Configurations and search for SERVER_PORT.
- In normal mode, delete the security authentication parameter --tlsv1.2 --negotiate -k -v -u : and change https to http.
- Use either of the following methods:
- Method 1: Delete the indexes that are no longer used in the cluster.
- Method 2: Change the threshold of the total number of instance shards.
If you change the threshold to be greater than 500, modify cluster.routing.allocation.total_shards_per_node at the same time. The modification takes effect immediately without restarting the Elasticsearch service.
- Five minutes later, check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 7.
Collect fault information.
- On FusionInsight Manager, choose O&M > Log > Download.
- Select Elasticsearch in the required cluster for Service.
- Click in the upper right corner. In the displayed dialog box, set Start Date and End Date to 10 minutes before and after the alarm generation time respectively and click OK. Then, click Download.
- Contact the O&M engineers and send the collected logs.
Alarm Clearance
This alarm will be automatically cleared after the fault is rectified.
Related Information
None.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot