ALM-50226 Unavailable BE Instances

Alarm Description

The system checks the BE process status every 30 seconds. This alarm is generated when the value is greater than 0 (0 indicates that the BE process is normal and 1 indicates that the BE process is abnormal).

This alarm is cleared when the system detects that the BE process becomes normal.

Alarm Attributes

Alarm ID	Alarm Severity	Alarm Type	Service Type	Auto Cleared
50226	Critical	Error handling	Doris	Yes

Alarm Parameters

Type	Parameter	Description
Location Information	Source	Specifies the cluster or system for which the alarm is generated.
	ServiceName	Specifies the service for which the alarm is generated.
	RoleName	Specifies the role for which the alarm is generated.
	HostName	Specifies the host for which the alarm is generated.
Additional Information	Detail	Specifies the alarm triggering condition.

Impact on the System

The BE instance is unavailable and cannot provide the data read and write functions.

Possible Causes

The BE instance is faulty or restarted.
The BE node disks are abnormal.
The local disk space of BE nodes is insufficient.

Handling Procedure

View the BE instance status.

Log in to FusionInsight Manager and choose O&M > Alarm > Alarms. In the alarm list, view the role name and obtain the IP address of the instance in Location of the alarm whose ID is 50226.
Choose Cluster > Services > Doris > Instances, click the BE instance for which the alarm is generated, and check whether Running Status of the instance is Unknown or Restoring.
- If yes, go to 3.
- If no, go to 5.
Return to the Instances page, select the BE instance, and choose More > Restart Instance.
After the BE instance is restarted, choose O&M > Alarm > Alarms. In the alarm list, check whether alarm "Unavailable BE Instances" is cleared.
- If yes, no further action is required.
- If no, go to 5.

Check BE node disks.

In the alarm list, check whether the BE instances listed in 1 report the "Disk Status of a Specified Data Directory on BE Is Abnormal" alarm.
- If yes, go to 6.
- If no, go to 8.
Contact O&M engineers to repair the disk.
In the alarm list, check whether the "Unavailable BE Instances" alarm is cleared.
- If yes, no further action is required.
- If no, go to 8.

Check the local disk space of BE nodes.

In the alarm list, check whether the BE instance in 1 reports the "BE Data Disk Usage Exceeds the Threshold" alarm.
- If yes, go to 9.
- If no, go to 11.
Perform the following operations to increase the BE disk space:
- Check the value of storage_root_path in the ${BIGDATA_HOME}/FusionInsight_Doris_*/*_*_BE/etc/be.conf file and mount more disks to the directory as you need.
- Delete data from partitions that are no longer used in the table based on service demand.
- On FusionInsight Manager, choose Cluster > Service > Doris > Instances > Add Instance, and add BE nodes as you need.
- After the MySQL client is connected to Doris, run the following command to reduce the number of table replicas based on service demand:
  alter table tblName set ("replication_allocation" = "tag.location.default: xxx");
In the alarm list, check whether the "Unavailable BE Instances" alarm is cleared.
- If yes, no further action is required.
- If no, go to 11.

Collect fault information.

On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
Expand the Service drop-down list, select Doris for the target cluster, and click OK.
Click the edit icon in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
Contact O&M engineers and provide the collected logs.