ALM-38010 Topics with Single Replica
Alarm Description
The system checks the number of replicas of each topic every 60 seconds on the node where the Kafka Controller resides. This alarm is generated when there is one replica for a topic.
Alarm Attributes
Alarm ID |
Alarm Severity |
Alarm Type |
Service Type |
Auto Cleared |
---|---|---|---|---|
38010 |
Major |
Quality of service |
Kafka |
No |
Alarm Parameters
Type |
Parameter |
Description |
---|---|---|
Location Information |
Source |
Specifies the cluster for which the alarm is generated. |
ServiceName |
Specifies the service for which the alarm is generated. |
|
RoleName |
Specifies the role for which the alarm is generated. |
|
TopicName |
Specifies the list of topics for which the alarm is generated. |
Impact on the System
There is the single point of failure (SPOF) risk for topics with only one replica. When the node where the replica resides becomes abnormal, the partition does not have a leader, and services on the topic are affected.
Possible Causes
- The number of replicas for the topic is incorrectly configured.
Handling Procedure
Check the number of replicas for the topic.
- On FusionInsight Manager, choose O&M > Alarm > Alarms, click of this alarm, and view the TopicName list in Location.
- Check whether replicas need to be added for the topic for which the alarm is generated.
- On the FusionInsight client, re-plan topic replicas and describe the partition distribution of the topic in the add-replicas-reassignment.json file in the following format: {"partitions":[{"topic": "topic name","partition": 1,"replicas": [1,2] }],"version":1}. Then, run the following command to add replicas:
kafka-reassign-partitions.sh --zookeeper {zk_host}:{port}/kafka --reassignment-json-file {manual assignment json file path} --execute
For example:
/opt/client/Kafka/kafka/bin/kafka-reassign-partitions.sh --zookeeper 192.168.0.90:2181,192.168.0.91:2181,192.168.0.92:2181/kafka --reassignment-json-file add-replicas-reassignment.json --execute
- Run the following command to check the task execution progress:
kafka-reassign-partitions.sh --zookeeper {zk_host}:{port}/kafka --reassignment-json-file {manual assignment json file path} --verify
For example:
/opt/client/Kafka/kafka/bin/kafka-reassign-partitions.sh --zookeeper 192.168.0.90:2181,192.168.0.91:2181,192.168.0.92:2181/kafka --reassignment-json-file add-replicas-reassignment.json --verify
- After completing the handling operations or confirming that the alarm has no impact, manually clear the alarm on FusionInsight Manager.
- After a period of time, check whether the alarm is cleared.
- If it is, no further action is required.
- If it is not, go to 7.
Collect fault information.
- On FusionInsight Manager, choose O&M > Log > Download.
- In the Service area, select Kafka in the required cluster.
- Click in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact the O&M engineers and send the collected logs.
Alarm Clearance
If the alarm has no impact, manually clear the alarm.
Related Information
None.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot