ALM-19022 HBase Hotspot Detection Is Unavailable
Alarm Description
When the MetricController instance is installed for HBase, the alarm module checks the health status of the active HBase MetricController instance every 120 seconds. This alarm is generated when the active HBase MetricController instance does not exist or is unavailable and the hotspot detection function is unavailable.
This alarm is cleared when the active HBase MetricController instance recovers.
This alarm applies only to MRS 3.3.0 or later.
Alarm Attributes
Alarm ID |
Alarm Severity |
Auto Cleared |
---|---|---|
19022 |
Major |
Yes |
Alarm Parameters
Parameter |
Description |
---|---|
Source |
Specifies the cluster for which the alarm is generated. |
ServiceName |
Specifies the service for which the alarm is generated. |
RoleName |
Specifies the role for which the alarm is generated. |
HostName |
Specifies the host for which the alarm is generated. |
Impact on the System
The HBase hotspot detection function is unavailable. Services are not affected. However, if request/data skew occurs, the system cannot report alarms and automatically recovers from hotspotting. Service requests may cause node overload, slow response, and request timeout.
Possible Causes
- The ZooKeeper service is abnormal.
- The HBase service is abnormal.
- In the current HBase service, the MetricController instance on the same node as the active HMaster instance is not started.
- The network is abnormal.
Handling Procedure
Check the ZooKeeper service status.
- In the service list on FusionInsight Manager, check whether Running Status of ZooKeeper is Normal.
- In the alarm list, check whether ALM-13000 ZooKeeper Service Unavailable exists.
- Rectify the fault by performing the operations provided for ALM-13000 ZooKeeper Service Unavailable.
- Wait for several minutes and check whether the alarm HBase Hotspot Detection Is Unavailable is cleared.
- If yes, no further action is required.
- If no, go to 5.
Check the HBase service status.
- In the service list on FusionInsight Manager, check whether Running Status of HBase is Normal.
- In the alarm list, check whether the alarm ALM-19000 HBase Service Unavailable exists.
- Rectify the fault by following the steps provided for ALM-19000 HBase Service Unavailable.
- Wait for several minutes and check whether the alarm HBase Hotspot Detection Is Unavailable is cleared.
- If yes, no further action is required.
- If no, go to 9.
Check whether the MetricController instance deployed on the same node as the active HMaster instance is started.
- On FusionInsight Manager, choose Cluster > Service > HBase, and click Instances to check whether the MetricController(Active) instance exists.
- Select the MetricController instance whose management IP address is the same as that of the active HMaster instance, and click Start Instance.
- After the MetricController instance is restarted, check whether the alarm HBase Hotspot Detection Is Unavailable is cleared.
- If yes, no further action is required.
- If no, go to 12.
Check the network connectivity between the started MetricController instances and the active HMaster node.
- Log in to the node where the active HMaser instance is deployed and run ping IP address of the node where the standby MetricController instance is deployed to check whether the network connection between the started MetricController instances and the host where the active HMaster instance is deployed is normal.
- Contact the network administrator to restore the network.
- After the network recovers, check whether the alarm HBase Hotspot Detection Is Unavailable is cleared.
- If yes, no further action is required.
- If no, go to 15.
Collect fault information.
- On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
- Expand the Service drop-down list, and select HBase for the target cluster.
- In the Host area, select the host where the HMaster instance is deployed.
- Click the edit icon in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact O&M personnel and provide the collected logs.
Alarm Clearance
This alarm is automatically cleared after the fault is rectified.
Related Information
None.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot