ALM-38000 Kafka Service Unavailable

The system checks the Kafka service status every 30 seconds. This alarm is generated when the Kafka service is unavailable.

This alarm is cleared when the Kafka service recovers.

Alarm ID	Alarm Severity	Auto Cleared
38000	Critical	Yes

Parameter	Description
Source	Specifies the cluster for which the alarm is generated.
ServiceName	Specifies the service for which the alarm is generated.
RoleName	Specifies the role for which the alarm is generated.
HostName	Specifies the host for which the alarm is generated.

The cluster cannot provide the Kafka service, and users cannot perform new Kafka tasks.

Check the status of the KrbServer service. (Skip this step if the normal mode is used.)

Log in to FusionInsight Manager and choose Cluster > Services > KrbServer.
Check whether the running status of the KrbServer service is Normal.
- If yes, go to Step 5.
- If no, go to Step 3.
Rectify the fault by following the steps provided in ALM-25500 KrbServer Service Unavailable.
Perform Step 2 again.

Check the status of the ZooKeeper cluster.

Check whether the running status of the ZooKeeper service is Normal.
- If yes, go to Step 8.
- If no, go to Step 6.
If ZooKeeper service is stopped, start it. Otherwise, rectify the fault by following the steps provided in ALM-13000 ZooKeeper Service Unavailable.
Perform Step 5 again.

Check the Broker status.

Choose Cluster > Services > Kafka and click Instances. The Kafka instance page is displayed.
Check whether all instances in Roles are running properly.
- If yes, go to Step 11.
- If no, go to Step 10.
Select all Broker instances, choose More > Restart Instance, and check whether the instances restart successfully.

During the restart of the Broker instance, if the current Topic is a single copy and is on the current Broker node, the Kafka service will be interrupted. Otherwise, the Kafka service will not be affected.
- If yes, go to Step 11.
- If no, go to Step 13.
Choose Cluster > Services > Kafka and check whether the service status is Normal.
- If yes, go to Step 12.
- If no, go to Step 13.
Wait for 30 seconds and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to Step 13.

Collect fault information.

On the FusionInsight Manager portal, choose O&M > Log > Download.
Select Kafka in the required cluster from the Service drop-down list.
Click in the upper right corner, and select a time span starting 10 minutes before and ending 10 minutes after when the alarm was generated. Then, click Download to collect the logs.
Send the collected fault logs to O&M personnel for help.