ALM-38001 Insufficient Kafka Disk Capacity (For MRS 2.x or Earlier)

Description

The system checks the Kafka disk usage every 60 seconds and compares it with the threshold. This alarm is generated if the disk usage exceeds the threshold.

To modify the threshold, users can choose System > Threshold Configuration on MRS Manager.

This alarm is cleared if the Kafka disk usage is lower than or equal to the threshold.

Attribute

Alarm ID	Alarm Severity	Auto Clear
38001	Major	Yes

Parameters

Parameter	Description
ServiceName	Specifies the service for which the alarm is generated.
RoleName	Specifies the role for which the alarm is generated.
HostName	Specifies the host for which the alarm is generated.
PartitionName	Specifies the disk partition where the alarm is generated.
Trigger Condition	Generates an alarm when the actual indicator value exceeds the specified threshold.

Impact on the System

Kafka fails to write data to the disks.

Possible Causes

The Kafka disk configurations (such as disk count and disk size) are insufficient for the data volume.
The data retention period is long and historical data occupies large space.
Services are improperly planned. As a result, data is unevenly distributed and some disks are full.

Procedure

Go to the MRS cluster details page and choose Alarms.
In the alarm list, click the alarm and view the HostName and PartitionName of the alarm in Location of Alarm Details.
On the Hosts page, click the host name obtained in Step 2.
Check whether the Disk area contains the PartitionName of the alarm.
- If yes, go to Step 5.
- If no, manually clear the alarm and no further action is required.
In the Disk area, check whether the usage of the alarmed partition has reached 100%.
- If yes, go to Step 6.
- If no, go to Step 8.
In Instance, choose Broker > Instance Configuration. On the Instance Configuration page that is displayed, set Type to All and query the data directory parameter log.dirs.
Choose Components > Kafka > Instances. On the Kafka Instance page that is displayed, stop the Broker instance corresponding to Step 2. Then log in to the alarmed node and manually delete the data directory in Step 6. After all subsequent operations are complete, start the Broker instance.
Choose Components > Kafka > Service Configuration. The Kafka Configuration page is displayed.
Check whether disk.adapter.enable is true.
- If yes, go to Step 11.
- If no, change the value to true and go to Step 10.
Check whether the adapter.topic.min.retention.hours parameter, indicating the minimum data retention period, is properly configured.
- If yes, go to Step 12.
- If no, set it to a proper value and go to Step 12.
If the retention period cannot be adjusted for certain topics, the topics can be added to disk.adapter.topic.blacklist.
Wait 10 minutes and check whether the disk usage is reduced.
- If yes, wait until the alarm is cleared.
- If no, go to Step 12.
Go to the Kafka Topic Monitor page and query the data retention period configured for Kafka. Determine whether the retention period needs to be shortened based on service requirements and data volume.
- If yes, go to Step 13.
- If no, go to Step 14.
Find the topics with great data volumes based on the disk partition obtained in Step 2. Log in to the Kafka client and manually shorten the data retention period for these topics using the following command:

kafka-topics.sh --zookeeper ZooKeeper address:24002/kafka --alter --topic Topic name --config retention.ms=Retention period
Check whether partitions are properly configured for topics. For example, if the number of partitions for a topic with a large data volume is smaller than the number of disks, data may be unevenly distributed to the disks and the usage of some disks will reach the upper limit.

To identify topics with large data volumes, log in to the relevant nodes obtained in Step 2, go to the data directory (the configured directory before log.dirs in Step 6 is modified), and check the disk space used by each topic's partitions.
- If the partitions are improperly configured, go to Step 15.
- If the partitions are properly configured, go to Step 16.
On the Kafka client, add partitions to the topics.

kafka-topics.sh --zookeeper ZooKeeper address:24002/kafka --alter --topic Topic name --partitions=Number of new partitions

It is advised to set the number of new partitions to a multiple of the number of Kafka disks.

This operation may not quickly clear the alarm. Data will be gradually balanced among the disks.
Check whether the cluster capacity needs to be expanded.
- If yes, add nodes to the cluster and go to Step 17.
- If no, go to Step 17.
Wait a moment and then check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to Step 18.
Collect fault information.
1. On MRS Manager, choose System > Export Log.
2. Contact O&M engineers and send the collected logs.

Related Information

None

Parent Topic: MRS Cluster Alarm Handling Reference

Previous topic: ALM-38000 Kafka Service Unavailable (For MRS 2.x or Earlier)

Next topic: ALM-38002 Heap Memory Usage of Kafka Exceeds the Threshold (For MRS 2.x or Earlier)

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.

The system is busy. Please try again later.