ALM-43201 Heap Memory Usage of Elasticsearch Exceeds the Threshold

Alarm Description

The system checks the elasticsearch heap memory usage every 60 seconds. This alarm is generated when the heap memory usage exceeds the threshold.

When the number of smoothing times is 1, this alarm is cleared when the elasticsearch heap memory usage is less than or equal to the threshold.When the number of smoothing times is greater than 1, this alarm is cleared when the elasticsearch heap memory usage is less than or equal to 90% of the threshold.

Alarm Attributes

Alarm ID	Alarm Severity	Alarm Type	Service Type	Auto Cleared
43201	Major (default threshold: 90%) Critical (default threshold: 95%)	Quality of service	Elasticsearch	Yes

Alarm ID

Alarm Severity

Alarm Type

Service Type

Auto Cleared

43201

Major (default threshold: 90%)

Critical (default threshold: 95%)

Quality of service

Elasticsearch

Yes

Alarm Parameters

Type	Parameter	Description
Location Information	Source	Specifies the cluster for which the alarm is generated.
	ServiceName	Specifies the service for which the alarm is generated.
	RoleName	Specifies the role for which the alarm is generated.
	HostName	Specifies the host for which the alarm is generated.
Additional Information	Trigger Condition	Specifies the threshold for triggering the alarm.

Impact on the System

If the Elasticsearch heap memory usage is too high, the read and write performance of Elasticsearch index data may be affected. In serious cases, the process may restart.

Possible Causes

Elasticsearch memory is insufficient.

Handling Procedure

Delete invalid indexes.

Check whether the Elasticsearch cluster is in the security mode.
Specifically, on FusionInsight Manager, choose Cluster > Name of the desired cluster > Services > Elasticsearch. On the displayed page, click Configurations. Search for ELASTICSEARCH_SECURITY_ENABLE, and check whether the parameter can be queried and its value is true.
- If yes, go to 2.
- If no, go to 3.
If the security mode is used, configure the permission for running the curl command.
Log in to a host where Elasticsearch resides as user root.
Run the curl -XGET --tlsv1.2 --negotiate -k -v -u : 'https://ip:httpport/_cat/indices?v' command to query the index details in the current cluster.
- In this command, replace ip with the IP address of any node in the cluster.
- Replace httpport with the HTTP port number of the Elasticsearch instance, which is specified by SERVER_PORT. To obtain the parameter value, on FusionInsight Manager, choose Cluster > Name of the desired cluster > Services > Elasticsearch. On the displayed page, choose Configurations > All Configurations and search for SERVER_PORT.
- In common mode, delete the security authentication parameter --tlsv1.2 --negotiate -k -v -u, and change https to http.
- These rules also apply to the following curl commands.
Run the curl -XDELETE --tlsv1.2 --negotiate -k -v -u : 'https://ip:httpport/indexname' command to delete unnecessary indexes.

Deleting a file or folder is a high-risk operation. Ensure that the file or folder is no longer required before performing this operation.

Check the JVM memory usage and adjust system configurations.

On the FusionInsight Manager portal, choose Cluster >Name of the desired cluster > Services > Elasticsearch > Configurations > All Configurations.
In the upper right corner of the Configuration page, enter GC_OPTS in the search box and click . The GC_OPTS parameters of all instances are displayed.
Select the instance whose GC_OPTS value needs to be changed, and check whether the differentiated configuration icon is displayed after the instance value configuration box.
- If yes, go to 9.
- If no, go to 10.
Click. In the displayed dialog box, clickin the right pane and click OK to save the settings.
Adjust the values of -Xms and -Xmx of the GC_OPTS parameter by referring to the Note.
Suggestions on configuring the GC parameter of Elasticsearch:
- It is recommended that 50% memory be reserved for the Lucence cache and 50% memory for Solr. You are advised to allocate 30 GB (no more than 31 GB) to machines with large memory. Confirm that the JVM Compressed Oops function has been enabled. You can run the following command to check:
  java -server -Xms28G -Xmx28G -XX:+UseConcMarkSweepGC -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompressedOopsMode -version
  
  If the returned value for Compressed Oops mode is Zero based, it indicates that the JVM Compressed Oops function is enabled and you need to increase the size of the allocated memory. Change 28 GB to 29 GB and check whether the Compressed Oops function is enabled. Try until the allocated memory reaches the maximum for the Compressed Oops function to remain enabled.
  
  If the returned value for Compressed Oops mode is Non-zero based, it indicates that the JVM Compressed Oops function is disabled and you need to decrease the size of the allocated memory. Change 28 GB to 27 GB and check whether the Compressed Oops function is enabled. Try until the allocated memory reaches the maximum for the Compressed Oops function to remain enabled.
- It is recommended that -Xms and -Xmx be set to the same value to prevent dynamic adjustment of heap memory size by JVM from affecting the performance.
- If half of the computer memory is less than the number of instances multiplied by 30 GB, allocate the memory by referring to the following:
  Instance memory = (Computer memory x 0.5)/Number of instances on the computer
  
  For example, if a computer has a memory of 128 GB and has three Elasticsearch instances, the value of GC_OPTS is: 128 GB x 0.5/3 = 21 GB and confirm that the JVM Compressed Oops function has been enabled.

Modify Elasticsearch memory parameters and click Save and OK.
On FusionInsight Manager, choose Cluster > Name of the desired cluster > Services > Elasticsearch. On the displayed page, click Instance, select the instances whose Configuration Status is Expired, and restart the instances.
Check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 14.

Collect fault information.

On the FusionInsight Manager, choose O&M > Log > Download.
Select Elasticsearch in the required cluster from the Service list.
Click in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
Contact the O&M engineers and send the collected logs.