Help Center/ MapReduce Service/ User Guide (Ankara Region)/ Alarm Reference/ ALM-27006 Data Directory Disk Usage Exceeds the Threshold
Updated on 2024-11-29 GMT+08:00

ALM-27006 Data Directory Disk Usage Exceeds the Threshold

Alarm Description

The system checks the disk space usage of the data directory on the active DBServer node every 30 seconds and compares the actual disk space usage with the threshold. This alarm is generated when the disk space usage of the data directory exceeds the threshold for five consecutive times (configurable, five by default).

The trigger count is configurable. When the value is 1 and the data directory disk usage is no greater than the threshold, the alarm is cleared. When the trigger count is greater than 1 and the data directory disk usage is smaller than 90% of the threshold, the alarm is cleared.

Alarm Attributes

Alarm ID

Alarm Severity

Alarm Type

Service Type

Auto Cleared

27006

Critical (default threshold: 85%)

Major (default threshold: 80%)

Quality of service

FusionInsight Manager

Yes

Alarm Parameters

Type

Parameter

Description

Location Information

Source

Specifies the cluster for which the alarm is generated.

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

PartitionName

Specifies the disk partition for which the alarm is generated.

Additional Information

Trigger Condition

Specifies the condition for triggering the alarm.

Impact on the System

  • The DBService service process cannot provide the API for data writing.
  • When the disk space usage of the data directory exceeds 90%, the database enters the read-only mode and "Database Enters the Read-Only Mode" is generated. As a result, service data cannot be written to the database.

Possible Causes

  • The alarm threshold is improperly configured.
  • The database data volume is too large or the disk configuration cannot meet service requirements. As a result, the disk usage reaches the upper limit.

Handling Procedure

Check whether the threshold is set properly.

  1. On FusionInsight Manager, click O&M and choose Alarm > Thresholds in the navigation pane on the left. Click the name of the desired cluster, choose DBService > Database > Data Directory Disk Usage, and check whether the alarm threshold is 80%.

    • If yes, go to 3.
    • If no, go to 2.

  2. Modify the alarm threshold based on the service requirements..
  3. Click Cluster and choose the name of the desired cluster > Service > DBService. On the Dashboard page, view the Data Directory Disk Usage chart to check whether the disk usage of the data directory is less than the threshold.

    • If yes, go to 4.
    • If no, go to 5.

  4. Wait 2 minutes and check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 5.

Check whether large files are incorrectly written to the disk.

  1. Log in to the active management node as user omm.
  2. Run the following commands to check whether there are files over 500 MB in the disk of the data directory:

    source $DBSERVER_HOME/.dbservice_profile

    find "$DBSERVICE_DATA_DIR"/../ -type f -size +500M

    • If yes, go to 7.
    • If no, go to 8.

  3. Handle the incorrectly written files and check whether the alarm is cleared 2 minutes later.

    • If yes, no further action is required.
    • If no, go to 8.

Collect fault information.

  1. On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
  2. Expand the Service drop-down list, and select DBService for the target cluster.
  3. Specify Hosts for collecting logs, which is optional. By default, all hosts are selected.
  4. Click in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
  5. Contact O&M engineers and provide the collected logs.

Alarm Clearance

This alarm is automatically cleared after the fault is rectified.

Related Information

None.