Updated on 2024-04-11 GMT+08:00

ALM-43009 JobHistory GC Time Exceeds the Threshold (For MRS 2.x or Earlier)

Description

The system checks the GC time of the JobHistory process every 60 seconds. This alarm is generated when the detected GC time exceeds the threshold (12 seconds) for three consecutive times. You can change the threshold by choosing System > Threshold Configuration > Service > Spark > JobHistory GC Time > Total JobHistory GC Time. This alarm is cleared when the JobHistory GC time is shorter than or equal to the threshold.

Attribute

Alarm ID

Alarm Severity

Automatically Cleared

43009

Major

Yes

Parameters

Parameter

Description

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Impact on the System

If the GC time exceeds the threshold, JobHistory may run in low performance.

Possible Causes

The heap memory of the JobHistory process is overused or inappropriately allocated, causing frequent GC.

Procedure

  1. Check the GC time.

    1. Go to the cluster details page and choose Alarms.
    2. Select the alarm whose Alarm ID is 43009 and view the IP address and role name of the instance in Location.
    3. Choose Components > Spark > Instance > JobHistory (IP address of the instance for which the alarm is generated) > Customize > GC Time of the JobHistory Process. Click OK to view the GC time.
    4. Check whether the GC time of the JobHistory process is longer than 12 seconds.
      • If yes, go to 1.e.
      • If no, go to 2.
    5. Choose Components > Spark > Service Configuration. Set Type to All and choose JobHistory > Default. Increase the value of the SPARK_DAEMON_MEMORY parameter as required.
    6. Check whether the alarm is cleared.
      • If yes, no further action is required.
      • If no, go to 2.

  2. Collect fault information.

    1. On MRS Manager, choose System > Export Log.
    2. Contact the O&M engineers and send the collected logs.

Reference

None