Help Center/ MapReduce Service/ User Guide (Ankara Region)/ Alarm Reference/ ALM-38010 Topics with Single Replica
Updated on 2024-11-29 GMT+08:00

ALM-38010 Topics with Single Replica

Alarm Description

The system checks the number of replicas of each topic every 60 seconds on the node where the Kafka Controller resides. This alarm is generated when there is one replica for a topic.

Alarm Attributes

Alarm ID

Alarm Severity

Alarm Type

Service Type

Auto Cleared

38010

Major

Quality of service

Kafka

No

Alarm Parameters

Type

Parameter

Description

Location Information

Source

Specifies the cluster for which the alarm is generated.

ServiceName

Specifies the service for which the alarm is generated.

RoleName

Specifies the role for which the alarm is generated.

TopicName

Specifies the list of topics for which the alarm is generated.

Impact on the System

There is the single point of failure (SPOF) risk for topics with only one replica. When the node where the replica resides becomes abnormal, the partition does not have a leader, and services on the topic are affected.

Possible Causes

  • The number of replicas for the topic is incorrectly configured.

Handling Procedure

Check the number of replicas for the topic.

  1. On FusionInsight Manager, choose O&M > Alarm > Alarms, click of this alarm, and view the TopicName list in Location.
  2. Check whether replicas need to be added for the topic for which the alarm is generated.

    • If yes, go to 3.
    • If no, go to 5.

  3. On the FusionInsight client, re-plan topic replicas and describe the partition distribution of the topic in the add-replicas-reassignment.json file in the following format: {"partitions":[{"topic": "topic name","partition": 1,"replicas": [1,2] }],"version":1}. Then, run the following command to add replicas:

    kafka-reassign-partitions.sh --zookeeper {zk_host}:{port}/kafka --reassignment-json-file {manual assignment json file path} --execute

    For example:

    /opt/client/Kafka/kafka/bin/kafka-reassign-partitions.sh --zookeeper 192.168.0.90:2181,192.168.0.91:2181,192.168.0.92:2181/kafka --reassignment-json-file add-replicas-reassignment.json --execute

  4. Run the following command to check the task execution progress:

    kafka-reassign-partitions.sh --zookeeper {zk_host}:{port}/kafka --reassignment-json-file {manual assignment json file path} --verify

    For example:

    /opt/client/Kafka/kafka/bin/kafka-reassign-partitions.sh --zookeeper 192.168.0.90:2181,192.168.0.91:2181,192.168.0.92:2181/kafka --reassignment-json-file add-replicas-reassignment.json --verify

  5. After completing the handling operations or confirming that the alarm has no impact, manually clear the alarm on FusionInsight Manager.
  6. After a period of time, check whether the alarm is cleared.

    • If it is, no further action is required.
    • If it is not, go to 7.

Collect fault information.

  1. On FusionInsight Manager, choose O&M > Log > Download.
  2. In the Service area, select Kafka in the required cluster.
  3. Click in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
  4. Contact the O&M engineers and send the collected logs.

Alarm Clearance

If the alarm has no impact, manually clear the alarm.

Related Information

None.