ALM-43204 GC Duration of the Elasticsearch Process Exceeds the Threshold

Alarm Description

The system checks the garbage collection (GC) duration of the Elasticsearch process every 60s. This alarm is generated when the GC duration exceeds the threshold.

If Trigger Count is set to 1, and the GC duration of the Elasticsearch process is less than or equal to the threshold, this alarm is cleared. If Trigger Count is greater than 1, and the GC duration of the Elasticsearch process is less than or equal to 90% of the threshold, this alarm is cleared.

Alarm Attributes

Alarm ID	Alarm Severity	Alarm Type	Service Type	Auto Cleared
43204	Major (default threshold: 30000ms) Critical (default threshold: 60000ms)	Quality of service	Elasticsearch	Yes

Alarm ID

Alarm Severity

Alarm Type

Service Type

Auto Cleared

43204

Major (default threshold: 30000ms)

Critical (default threshold: 60000ms)

Quality of service

Elasticsearch

Yes

Alarm Parameters

Type	Parameter	Description
Location Information	Source	Specifies the cluster for which the alarm is generated.
	ServiceName	Specifies the service for which the alarm is generated.
	RoleName	Specifies the role for which the alarm is generated.
	HostName	Specifies the host for which the alarm is generated.
Additional Information	Trigger Condition	Specifies the threshold for triggering the alarm.

Impact on the System

If the GC time of the Elasticsearch instance process is too long, the index data read/write performance of Elasticsearch may be affected, and the request may time out.

Possible Causes

Service load of the Elasticsearch instance on the node is high or the heap memory is not properly configured. As a result, GC frequently occurs.

Handling Procedure

Check the configured heap memory.

Log in to FusionInsight Manager, choose O&M > Alarm > Alarms, and check the location information of this alarm. Check the IP address of the instance for which the alarm is generated.
On the FusionInsight Manager home page, choose Cluster > Name of the desired cluster > Services > Elasticsearch. On the displayed page, click Instance and Click the drop-down menu in the Chart area and choose Customize > Clear All > Garbage Collection > EsMaster GC Time Stats, and click OK. Check whether the GC duration is greater than the threshold.
Choose Cluster > Name of the desired cluster > Services > Elasticsearch. On the displayed page, click Configurations.
In the upper right corner of the Configuration page, enter GC_OPTS in the search box and click. The GC_OPTS parameter values of all instances are displayed.
Select the instance whose GC_OPTS value needs to be changed, and check whether the differentiated configuration iconis displayed after the instance value configuration box.
- If yes, go to 6.
- If no, go to 7.
Click. In the displayed dialog box, clickin the right pane and click OK to save the settings.
Adjust the values of -Xms and -Xmx of the GC_OPTS parameter by referring to the Note.
Suggestions on configuring the GC parameter of Elasticsearch:
- It is recommended that 50% memory be reserved for the Lucence cache and 50% memory for Solr. You are advised to allocate 30 GB (no more than 31 GB) to machines with large memory. Confirm that the JVM Compressed Oops function has been enabled. You can run the following command to check:
  java -server -Xms28G -Xmx28G -XX:+UseConcMarkSweepGC -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompressedOopsMode -version
  
  If the returned value for Compressed Oops mode is Zero based, it indicates that the JVM Compressed Oops function is enabled and you need to increase the size of the allocated memory. Change 28 GB to 29 GB and check whether the Compressed Oops function is enabled. Try until the allocated memory reaches the maximum for the Compressed Oops function to remain enabled.
  
  If the returned value for Compressed Oops mode is Non-zero based, it indicates that the JVM Compressed Oops function is disabled and you need to decrease the size of the allocated memory. Change 28 GB to 27 GB and check whether the Compressed Oops function is enabled. Try until the allocated memory reaches the maximum for the Compressed Oops function to remain enabled.
- It is recommended that -Xms and -Xmx be set to the same value to prevent dynamic adjustment of heap memory size by JVM from affecting the performance.
- If half of the computer memory is less than the number of instances multiplied by 30 GB, allocate the memory by referring to the following:
  Instance memory = (Computer memory x 0.5)/Number of instances on the computer
  
  For example, if a computer has a memory of 128 GB and has three Elasticsearch instances, the value of GC_OPTS is: 128 GB x 0.5/3 = 21 GB, and Confirm that the JVM Compressed Oops function has been enabled.
After the modification, click Save in the upper left corner. In the Save Configuration dialog box displayed, click OK.
On FusionInsight Manager, choose Cluster > Name of the desired cluster > Services > Elasticsearch. On the displayed page, click Instance, select the instances whose Configuration Status is Expired, and restart the instances.
Five minutes later, check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 11.

Collect fault information.

On FusionInsight Manager, and choose O&M > Log > Download.
Select Elasticsearch in the required cluster for Service.
Click in the upper right corner. In the displayed dialog box, set Start Date and End Date to 10 minutes before and after the alarm generation time respectively and click OK. Then, click Download.
Contact the O&M engineers and send the collected logs.