ALM-38017 Partition Reassignment Duration Exceeds the Threshold
Alarm Description
The system checks the partition reassignment time every 10 minutes. The check interval can be modified by the Kafka configuration auto.reassign.check.interval.ms. This alarm is generated when the partition reassignment triggered by a Broker scale-out takes more time than the threshold (1440 minutes by default, which can be modified by the Kafka configuration reassignment.total.time.threshold).
This alarm is cleared when the partition workload balancing is complete.
This alarm applies only to MRS 3.5.0 or later.
Alarm Attributes
Alarm ID |
Alarm Severity |
Auto Cleared |
---|---|---|
38017 |
Major |
Yes |
Alarm Parameters
Type |
Parameter |
Description |
---|---|---|
Location Information |
Source |
Specifies the cluster for which the alarm was generated. |
ServiceName |
Specifies the cluster service for which the alarm was generated. |
|
RoleName |
Specifies the role for which the alarm was generated. |
|
HostName |
Specifies the host for which the alarm was generated. |
|
Additional Information |
Trigger Condition |
Specifies the alarm triggering condition. |
Impact on the System
The Kafka service has unbalanced load for a long time, which may deteriorate the read and write performance.
Possible Causes
The amount of data in a partition to be migrated is too large, and the allowed triffic is low.
Handling Procedure
- Log in to the Kafka web UI.
- Log in to FusionInsight Manager as a user who has the permission to access the Kafka web UI.
- Choose Cluster > Services > Kafka.
- On the right of KafkaManager Web UI, click the URL to access the Kafka web UI.
- Click Current Reassign Status to view the partition reassignment tasks.
- Check the status of the partitions that are being migrated. If the progress remains unchanged for a long time, click Modify Reassignment Throttle to check whether the value of the Throttle parameter is too small.
- Wait for 10 minutes and check whether the partition migration progress changes significantly.
- If yes, no further action is required.
- If no, go to 5.
Collect fault information.
- On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
- Expand the Service drop-down list, and select Kafka for the target cluster.
- Click the edit icon in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact O&M engineers and provide the collected logs.
Alarm Clearance
This alarm is automatically cleared after the fault is rectified.
Related Information
None.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot