Help Center > > User Guide> FusionInsight Manager Operation Guide (Applicable to 3.x)> Alarm Reference (Applicable to MRS 3.x)> ALM-12101 AZ Unhealthy

ALM-12101 AZ Unhealthy

Updated at: Mar 25, 2021 GMT+08:00

Description

After the AZ DR function is enabled, the system checks the AZ health status every 5 minutes. This alarm is generated when the system detects that the AZ is subhealthy or unhealthy. This alarm is cleared when the AZ becomes healthy.

Attribute

Alarm ID

Alarm Severity

Auto Clear

12101

Major

Yes

Parameters

Parameter

Meaning

Source

Specifies the cluster for which the alarm is generated.

ServiceName

Specifies the service for which the alarm is generated.

AZName

Specifies the AZ for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

Impact on the System

The health status of an AZ depends on the health status of computing resources (Yarn) and storage resources (HDFS). If the AZ is healthy, the computing resources (Yarn) and storage resources (HDFS) are healthy. After a task is submitted to an AZ, the storage resources in the AZ can be directly used.

An AZ is subhealthy when:

  • The computing resources (Yarn) are unhealthy, but the storage resources (HDFS) are healthy. Tasks cannot be submitted to the local AZ, but data can still be read and written in the local AZ.
  • The computing resources (Yarn) are healthy, but some storage resources (HDFS) are unhealthy. Tasks can be submitted to the local AZ, and some data can be read and written in the local AZ. This depends on the locality of data detected by Spark/Hive scheduling.

An AZ is unhealthy when:

  • The computing resources (Yarn) are healthy, but the storage resources (HDFS) are unhealthy. Although tasks can be submitted to the local AZ, data cannot be read or written in the local AZ. As a result, the tasks submitted to the local AZ are invalid.
  • The computing resources (Yarn) and storage resources (HDFS) are unhealthy. Tasks cannot be submitted to the local AZ, and data cannot be read or written in the local AZ.

Possible Causes

  • The computing resources (Yarn) are unhealthy.
  • The storage resources (HDFS) are unhealthy.
  • Some storage resources (HDFS) are unhealthy.

Procedure

Disable the DR drill.

  1. On FusionInsight Manager, choose Cluster > Name of the desired cluster > Single-Cluster DR. The cluster DR page is displayed.
  2. In the AZ DR list, check whether Perform DR Drill in the Operation column of the AZ whose health status is Unhealthy is gray.

    • If yes, go to 4.
    • If no, go to 3.

  3. Click Restore in the Operation column of the target AZ. Wait 2 minutes and refresh the page to view the health status of the AZ. Check whether the health status is normal.

    • If yes, no further action is required.
    • If no, go to 4.

Collect the fault information.

  1. Click the drop-down list of the target AZ, expand the AZ details, view AZ Unhealthy Cause, and obtain the log file storage path.
  2. Log in to the active management node as user root.
  3. Open the path obtained in 4 on the host node that you have logged in to and view the log file details.
  4. Contact O&M personnel and provide detailed log file information.

Alarm Clearing

After the fault is rectified, the system automatically clears this alarm.

Related Information

None

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel