Help Center/ MapReduce Service/ User Guide (Ankara Region)/ Alarm Reference/ ALM-43209 Total Number of Elasticsearch Instance Shards Exceeds the Threshold
Updated on 2024-11-29 GMT+08:00

ALM-43209 Total Number of Elasticsearch Instance Shards Exceeds the Threshold

Alarm Description

The system checks the total number of Elasticsearch instance shards every 60 seconds and compares the number with the threshold. This alarm is generated when the system detects that the number exceeds the threshold for multiple consecutive times (three times by default).

The threshold can be changed by choosing O&M > Alarm > Thresholds > Name of the desired cluster > Elasticsearch > Shard > Number Of Shard.

If Trigger Count is set to 1, and the total number of Elasticsearch instance shards is less than or equal to the threshold, this alarm is cleared. If Trigger Count is greater than 1, and the total number of Elasticsearch instance shards is less than or equal to 90% of the threshold, this alarm is cleared.

Alarm Attributes

Alarm ID

Alarm Severity

Alarm Type

Service Type

Auto Cleared

43209

Major (default threshold: 400)

Critical (default threshold: 500)

Quality of service

Elasticsearch

Yes

Alarm Parameters

Type

Parameter

Description

Location Information

Source

Specifies the cluster for which the alarm is generated.

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Additional Information

Trigger Condition

Specifies the threshold for triggering the alarm.

Impact on the System

If the total number of Elasticsearch shards is too large, the index data read/write performance of Elasticsearch may be affected, and the shard restoration speed may be slow when the Elasticsearch process is restarted.

Possible Causes

The configuration of the Elasticsearch index shard number is inappropriate.

Handling Procedure

Check the total number of Elasticsearch instance shards.

  1. Check whether the Elasticsearch cluster is in the security mode.

    Specifically, on FusionInsight Manager, choose Cluster > Name of the desired cluster > Services > Elasticsearch. On the displayed page, click Configurations. Search for ELASTICSEARCH_SECURITY_ENABLE, and check whether the parameter can be queried and its value is true.

    • If yes, go to 2.
    • If no, go to 3.

  2. If the security mode is used, configure the permission for running the curl command.
  3. Log in to any node where Elasticsearch resides as user root.
  4. Run the curl -XGET --tlsv1.2 --negotiate -k -v -u : 'https://ip:httpport/_cat/allocation?v' command to query the total number of instance shards in the cluster.

    • In this command, replace ip with the IP address of any node in the cluster.
    • Replace httpport with the HTTP port number of the Elasticsearch instance, which is specified by SERVER_PORT. To obtain the parameter value, on FusionInsight Manager, choose Cluster > Services > Elasticsearch. On the displayed page, choose Configurations > All Configurations and search for SERVER_PORT.
    • In normal mode, delete the security authentication parameter --tlsv1.2 --negotiate -k -v -u : and change https to http.

  5. Use either of the following methods:

    • Method 1: Delete the indexes that are no longer used in the cluster.
    • Method 2: Change the threshold of the total number of instance shards.

    If you change the threshold to be greater than 500, modify cluster.routing.allocation.total_shards_per_node at the same time. The modification takes effect immediately without restarting the Elasticsearch service.

  6. Five minutes later, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 7.

Collect fault information.

  1. On FusionInsight Manager, choose O&M > Log > Download.
  2. Select Elasticsearch in the required cluster for Service.
  3. Click in the upper right corner. In the displayed dialog box, set Start Date and End Date to 10 minutes before and after the alarm generation time respectively and click OK. Then, click Download.
  4. Contact the O&M engineers and send the collected logs.

Alarm Clearance

This alarm will be automatically cleared after the fault is rectified.

Related Information

None.