ALM-26053 Storm Slot Usage Exceeds the Threshold

Description

The system checks the slot usage every 60 seconds and compares the actual slot usage with the threshold. This alarm is generated when the slot usage is greater than the threshold.

You can change the threshold in O&M > Alarm > Thresholds.

This alarm is cleared when the slot usage is less than or equal to the threshold.

Attribute

Alarm ID	Alarm Severity	Automatically Cleared
26053	Major	Yes

Parameters

Name	Meaning
Source	Specifies the cluster for which the alarm is generated.
ServiceName	Specifies the service for which the alarm is generated.
RoleName	Specifies the role for which the alarm is generated.
HostName	Specifies the host for which the alarm is generated.
Trigger condition	Specifies the threshold triggering the alarm. If the current indicator value exceeds this threshold, the alarm is generated.

Impact on the System

New Storm tasks cannot be performed.

Possible Causes

The status of some Supervisors in the cluster is abnormal.
The status of all Supervisors is normal, but the processing capability is insufficient.

Procedure

Check the Supervisor status.

Choose Cluster > Name of the desired cluster > Services > Storm > Instance to go to the Storm instance management page.
Check whether any instance whose status is Faulty or Restoring exists.
- If yes, go to 3.
- If no, go to 5.
Select Supervisor role instances whose status is Faulty or Restoring, choose More > Restart Instance, and check whether the instances restart successfully.
- If yes, go to 4.
- If no, go to 10.
Wait several minutes, and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 5.

Increase the number of slots in each Supervisor.

Log in to the FusionInsight Manager portal, choose Cluster > Name of the desired cluster > Services > Storm > Configurations > All Configurations.
Increase the number of ports in the supervisor.slots.ports parameter of each Supervisor role and restart the instance.
Wait several minutes, and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 8.

Perform capacity expansion for Supervisor.
Wait several minutes, and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 10.
  
  Services are interrupted when the Supervisor is being restarted. Then, services are restored after the restarting.

Collect fault information.

On the FusionInsight Manager portal, choose O&M > Log > Download.
Select Storm and ZooKeeper in the required cluster from the Service drop-down list box.
Click in the upper right corner, and set Start Date and End Date for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click Download.
Contact the O&M personnel and send the collected logs.