Help Center/ MapReduce Service/ User Guide (Ankara Region)/ Alarm Reference/ ALM-38008 Abnormal Kafka Data Directory Status
Updated on 2024-11-29 GMT+08:00

ALM-38008 Abnormal Kafka Data Directory Status

Alarm Description

The system checks the Kafka data directory status every 60 seconds. This alarm is generated when the system detects that the status of a data directory is abnormal.

Trigger Count is set to 1. This alarm is cleared when the data directory status becomes normal.

Alarm Attributes

Alarm ID

Alarm Severity

Alarm Type

Service Type

Auto Cleared

38008

Major

Quality of service

Kafka

Yes

Alarm Parameters

Type

Parameter

Description

Location Information

Source

Specifies the cluster for which the alarm is generated.

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

HostName

Specifies the host name for which the alarm is generated.

DirName

Specifies the directory name for which the alarm is generated.

Additional Information

Trigger Condition

Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.

Impact on the System

If the Kafka data directory status is abnormal, the current replicas of all partitions in the data directory are brought offline, and the data directory status of multiple nodes is abnormal at the same time. As a result, some partitions may become unavailable.

Possible Causes

  • The data directory permission is tampered with.
  • The disk where the data directory is located is faulty.

ProcedureHandling Procedure

Check the permission on the faulty data directory.

  1. Find the host information in the alarm information and log in to the host.
  2. In the alarm information, check whether the data directory and its subdirectories belong to the omm:wheel group.

    • If yes, record the host name of the node and go to 4.
    • If no, go to 3.

  3. Restore the owner group of the data directory and its subdirectories to omm:wheel.

    • If yes, go to 6.
    • If no, go to 5.

Check whether the disk where the data directory is located is faulty.

  1. In the upper-level directory of the data directory, create and delete files as user omm. Check whether data read/write on the disk is normal.
  2. Replace or repair the disk where the data directory is located to ensure that data read/write on the disk is normal.
  3. On the FusionInsight Manager home page, choose Cluster > Name of the desired cluster > Services > Kafka > Instance. On the Kafka instance page that is displayed, restart the Broker instance on the host recorded in 2.
  4. After Broker is started, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 8.

Collect fault information.

  1. On FusionInsight Manager, choose O&M > Log > Download.
  2. In the Service area, select Kafka in the required cluster.
  3. Click in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
  4. Contact the O&M engineers and send the collected logs.

Alarm Clearance

After the fault is rectified, the system automatically clears this alarm.

Related Information

None.