Help Center/ MapReduce Service/ User Guide/ MRS Cluster O&M/ MRS Cluster Alarm Handling Reference/ ALM-43008 Direct Memory Usage of the JobHistory Process Exceeds the Threshold (For MRS 2.x or Earlier)
Updated on 2024-09-23 GMT+08:00

ALM-43008 Direct Memory Usage of the JobHistory Process Exceeds the Threshold (For MRS 2.x or Earlier)

Description

The system checks the JobHistory process status every 30 seconds. The alarm is generated when the direct memory usage of the JobHistory process exceeds the threshold (90% of the maximum memory).

Attribute

Alarm ID

Alarm Severity

Automatically Cleared

43008

Major

Yes

Parameters

Parameter

Description

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Impact on the System

If the available JobHistory process direct memory is insufficient, a memory overflow occurs and the service breaks down.

Possible Causes

The direct memory of the JobHistory process is overused or the direct memory is inappropriately allocated.

Procedure

  1. Check the direct memory usage.

    1. Go to the cluster details page and choose Alarms.
    2. Select the alarm whose Alarm ID is 43008 and view the IP address and role name of the instance in Location.
    3. Choose Components > Spark > Instance > JobHistory (IP address of the instance for which the alarm is generated) > Customize > Direct Memory Statistics of the JobHistory Process. Click OK to view the direct memory usage.
    4. Check whether the direct memory usage of the JobHistory process has reached the threshold (90% of the maximum direct memory).
      • If yes, go to 1.e.
      • If no, go to 2.
    5. Choose Components > Spark > Service Configuration. Set Type to All and choose JobHistory > Default. Increase the value of -XX:MaxDirectMemorySize in SPARK_DAEMON_JAVA_OPTS as required.
    6. Check whether the alarm is cleared.
      • If yes, no further action is required.
      • If no, go to 2.

  2. Collect fault information.

    1. On MRS Manager, choose System > Export Log.
    2. Contact the O&M engineers and send the collected logs.

Reference

None