Help Center > > User Guide> Managing Active Clusters> Alarm Reference> ALM-43001 Spark Service Unavailable

ALM-43001 Spark Service Unavailable

Updated at: Mar 31, 2020 GMT+08:00

Description

The system checks the Spark service status every 60 seconds. This alarm is generated when the Spark service is unavailable and is cleared when the Spark service recovers.

Attribute

Alarm ID

Alarm Severity

Automatically Cleared

43001

Critical

Yes

Parameters

Parameter

Description

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Impact on the System

The tasks submitted by users fail to be executed.

Possible Causes

  • The KrbServer service is abnormal.
  • The LdapServer service is abnormal.
  • The ZooKeeper service is abnormal.
  • The HDFS service is abnormal.
  • The Yarn service is abnormal.
  • The corresponding Hive service is abnormal.

Procedure

  1. Check whether service unavailability alarms exist in services on which Spark depends.

    1. On the MRS cluster details page, click Alarms.

      For MRS 1.8.10 or earlier, log in to MRS Manager and click Alarms.

    2. Check whether the following alarms exist in the alarm list:
      • ALM-25500 KrbServer Service Unavailable
      • ALM-25000 LdapServer Service Unavailable
      • ALM-13000 ZooKeeper Service Unavailable
      • ALM-14000 HDFS Service Unavailable
      • ALM-18000 Yarn Service Unavailable
      • ALM-16004 Hive Service Unavailable

        If yes, go to 1.c

        If no, go to 2.

    3. Handle the service unavailability alarms based on the troubleshooting methods provided in the alarm help.

      After all the service unavailability alarms are cleared, wait a few minutes and check whether this alarm is cleared.

      • If yes, no further action is required.
      • If no, go to 2.

  2. Collect fault information.

    1. On MRS Manager, choose System > Export Log.
    2. Select the following nodes from the Service drop-down list and click OK (Hive is the specific Hive service determined based on ServiceName in the alarm location information).
      • KrbServer
      • LdapServer
      • ZooKeeper
      • HDFS
      • Yarn
      • Hive
    3. Set Start Time for log collection to 10 minutes ahead of the alarm generation time and End Time to 10 minutes behind the alarm generation time, and click Download.
    4. Contact the O&M personnel and send the collected log information.

Related Information

N/A

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel