ALM-50225 Unavailable FE Instances

Alarm Description

The system checks the FE process status every 30 seconds. This alarm is generated when the value is greater than 0 (0 indicates that the FE process is normal and 1 indicates that the FE process is abnormal).

This alarm is cleared when the system detects that the FE process becomes normal.

Alarm Attributes

Alarm ID	Alarm Severity	Alarm Type	Service Type	Auto Cleared
50225	Critical	Error handling	Doris	Yes

Alarm Parameters

Type	Parameter	Description
Location Information	Source	Specifies the cluster or system for which the alarm is generated.
	ServiceName	Specifies the service for which the alarm is generated.
	RoleName	Specifies the role for which the alarm is generated.
	HostName	Specifies the host for which the alarm is generated.
Additional Information	Detail	Specifies the alarm triggering condition.

Impact on the System

The FE instance is unavailable and cannot respond to client requests.

Possible Causes

The FE instance is faulty or restarted.
The local disk space of FE nodes is insufficient.
FE node memory is insufficient.

Handling Procedure

View the FE instance status.

Log in to FusionInsight Manager and choose O&M > Alarm > Alarms. In the alarm list, view the role name and obtain the IP address of the instance in Location of the alarm whose ID is 50225.
Choose Cluster > Services > Doris > Instances, click the FE instance for which the alarm is generated, and check whether Running Status of the instance is Unknown or Restoring.
- If yes, go to 3.
- If no, go to 5.
Return to the Instances page, select the FE instance, and choose More > Restart Instance.
After the FE instance is restarted, choose O&M > Alarm > Alarms. In the alarm list, check whether alarm "Unavailable FE Instances" is cleared.
- If yes, no further action is required.
- If no, go to 5.

View the local disk space of FE.

Log in to the node where the FE instance queried in 1 is deployed and check the value of meta_dir in the ${BIGDATA_HOME}/FusionInsight_Doris_*/*_*_FE/etc/fe.conf file.

For example, the value of meta_dir is as follows:
```
meta_dir = /srv/BigData/doris_fe/doris-meta
```
Run the following command to check whether the disk usage of meta_dir reaches 100%:

df -h /srv/BigData/doris_fe/doris-meta

For example, the following command output indicates that the disk usage of meta_dir is 40%:
```
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda2        98G   37G   57G  40% /
```
- If yes, go to 7.
- If no, go to 8.
Delete unnecessary information from the directory to ensure that the over 80% of disk space is available. Wait for several minutes and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 8.

View the FE node memory.

Log in to FusionInsight Manager, choose Cluster > Services > Doris > Instances. Click the FE instance for which the alarm is generated, click Chart, select CPU and Memory from the chart category, and check whether the FE memory usage reaches 100%.
- If yes, go to 9.
- If no, go to 10.
If the FE memory usage is too high, the processes connected to the FE service will be stopped and the occupied resources will be released. Wait for several minutes and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 10.

Collect fault information.

On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
Expand the Service drop-down list, select Doris for the target cluster, and click OK.
Click the edit icon in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
Contact O&M engineers and provide the collected logs.