ALM-43205 Elasticsearch Stored Shard Data Volume Exceeds the Threshold

Alarm Description

The system checks the volume of shard data stored in Elasticsearch every 60 seconds and compares the volume with the threshold. This alarm is generated when the system detects that the volume exceeds the threshold for multiple consecutive times (three times by default).

The threshold can be changed by choosing O&M > Alarm > Thresholds > Name of the desired cluster > Elasticsearch > Shard > Elasticsearch Shard Data Volume (EsMaster).

If Trigger Count is set to 1, and the volume of shard data stored in Elasticsearch is less than or equal to the threshold, this alarm is cleared. If Trigger Count is greater than 1, and the volume of shard data stored in Elasticsearch is less than or equal to 90% of the threshold, this alarm is cleared.

Alarm Attributes

Alarm ID	Alarm Severity	Alarm Type	Service Type	Auto Cleared
43205	Major (default threshold: 41943040KB) Critical (default threshold: 83886080KB)	Quality of service	Elasticsearch	Yes

Alarm ID

Alarm Severity

Alarm Type

Service Type

Auto Cleared

43205

Major (default threshold: 41943040KB)

Critical (default threshold: 83886080KB)

Quality of service

Elasticsearch

Yes

Alarm Parameters

Type	Parameter	Description
Location Information	Source	Specifies the cluster for which the alarm is generated.
	ServiceName	Specifies the service for which the alarm is generated.
	RoleName	Specifies the role for which the alarm is generated.
	HostName	Specifies the host for which the alarm is generated.
Additional Information	Trigger Condition	Specifies the threshold for triggering the alarm.

Impact on the System

A large amount of data is stored in Elasticsearch shards, which may slow down the read and write performance of Elasticsearch index data. When the Elasticsearch process is restarted, the restoration of a large amount of data slows down.

Possible Causes

The number of index shards is incorrectly configured. As a result, the stored shard data volume exceeds the threshold.

Handling Procedure

Check the stored shard data volume

Check whether the Elasticsearch cluster is in the security mode.

Specifically, on FusionInsight Manager, choose Cluster > Name of the desired cluster > Services > Elasticsearch. On the displayed page, click Configurations. In the upper right corner of the configuration page, search for ELASTICSEARCH_SECURITY_ENABLE and check whether the parameter can be queried and its value is true.
- If yes, go to 2.
- If no, go to 3.
If the security mode is used, configure the permission for running the curl command.
Log in to any node where Elasticsearch resides as user root.
Run the following command to query the stored shard data volume of the current cluster: curl -XGET --tlsv1.2 --negotiate -k -v -u : 'https://ip:httpport/_cat/shards?v&s=store:desc'
- In this command, replace ip with the IP address of any node in the cluster.
- Replace httpport with the HTTP port number of the Elasticsearch instance, which is specified by SERVER_PORT. To obtain the parameter value, on FusionInsight Manager, choose Cluster > Name of the desired cluster > Services > Elasticsearch. On the displayed page, choose Configurations > All Configurations and search for SERVER_PORT.
- In normal mode, delete the security authentication parameter --tlsv1.2 --negotiate -k -v -u: and change https to http.

Obtain the index with a large amount of shard data. You are advised to plan the index again as follows:
- Method 1: Stop writing data to the index and plan a new index to store the written data.
- Method 2: Migrate data in the index in which stored shard data volume exceeds the threshold to the planned new index, and delete the old index.
  
  Deleting a file or folder is a high-risk operation. Ensure that the file or folder is no longer required before performing this operation.

After the index planning is complete, Five minutes after the index planning is complete, check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 7.

Collect fault information.

On FusionInsight Manager, and choose O&M > Log > Download.
Select Elasticsearch in the required cluster for Service.
Click in the upper right corner. In the displayed dialog box, set Start Date and End Date to 10 minutes before and after the alarm generation time respectively and click OK. Then, click Download.
Contact the O&M engineers and send the collected logs.