ALM-38001 Insufficient Kafka Disk Capacity (For MRS 2.x or Earlier)
Description
The system checks the Kafka disk usage every 60 seconds and compares it with the threshold. This alarm is generated if the disk usage exceeds the threshold.
To modify the threshold, users can choose
on MRS Manager.This alarm is cleared if the Kafka disk usage is lower than or equal to the threshold.
Attribute
Alarm ID |
Alarm Severity |
Auto Clear |
---|---|---|
38001 |
Major |
Yes |
Parameters
Parameter |
Description |
---|---|
ServiceName |
Specifies the service for which the alarm is generated. |
RoleName |
Specifies the role for which the alarm is generated. |
HostName |
Specifies the host for which the alarm is generated. |
PartitionName |
Specifies the disk partition where the alarm is generated. |
Trigger Condition |
Generates an alarm when the actual indicator value exceeds the specified threshold. |
Impact on the System
Kafka fails to write data to the disks.
Possible Causes
- The Kafka disk configurations (such as disk count and disk size) are insufficient for the data volume.
- The data retention period is long and historical data occupies large space.
- Services are improperly planned. As a result, data is unevenly distributed and some disks are full.
Procedure
- Go to the MRS cluster details page and choose Alarms.
- In the alarm list, click the alarm and view the HostName and PartitionName of the alarm in Location of Alarm Details.
- On the Hosts page, click the host name obtained in 2.
- Check whether the Disk area contains the PartitionName of the alarm.
- If yes, go to 5.
- If no, manually clear the alarm and no further action is required.
- In the Disk area, check whether the usage of the alarmed partition has reached 100%.
- In Instance, choose . On the Instance Configuration page that is displayed, set Type to All and query the data directory parameter log.dirs.
- Choose Kafka Instance page that is displayed, stop the Broker instance corresponding to 2. Then log in to the alarmed node and manually delete the data directory in 6. After all subsequent operations are complete, start the Broker instance. . On the
- Choose . The page is displayed.
- Check whether disk.adapter.enable is true.
- Check whether the adapter.topic.min.retention.hours parameter, indicating the minimum data retention period, is properly configured.
If the retention period cannot be adjusted for certain topics, the topics can be added to disk.adapter.topic.blacklist.
- Wait 10 minutes and check whether the disk usage is reduced.
- If yes, wait until the alarm is cleared.
- If no, go to 12.
- Go to the Kafka Topic Monitor page and query the data retention period configured for Kafka. Determine whether the retention period needs to be shortened based on service requirements and data volume.
- Find the topics with great data volumes based on the disk partition obtained in 2. Log in to the Kafka client and manually shorten the data retention period for these topics using the following command:
kafka-topics.sh --zookeeper ZooKeeper address:24002/kafka --alter --topic Topic name --config retention.ms=Retention period
- Check whether partitions are properly configured for topics. For example, if the number of partitions for a topic with a large data volume is smaller than the number of disks, data may be unevenly distributed to the disks and the usage of some disks will reach the upper limit.
- On the Kafka client, add partitions to the topics.
kafka-topics.sh --zookeeper ZooKeeper address:24002/kafka --alter --topic Topic name --partitions=Number of new partitions
It is advised to set the number of new partitions to a multiple of the number of Kafka disks.
This operation may not quickly clear the alarm. Data will be gradually balanced among the disks.
- Check whether the cluster capacity needs to be expanded.
- Wait a moment and then check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 18.
- Collect fault information.
- On MRS Manager, choose .
- Contact the O&M engineers and send the collected logs.
Related Information
N/A
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.