ALM-19011 RegionServer Region Number Exceeds the Threshold
Description
The system checks the number of regions on each RegionServer in each HBase service instance every 30 seconds. The region number is displayed on the HBase service monitoring page and RegionServer role monitoring page. This alarm is generated when the number of regions on a RegionServer exceeds the threshold (default value: 2000) for 20 consecutive times. The threshold can be changed by choosing O&M > Alarm > Thresholds > Name of the desired cluster > HBase. This alarm is cleared when the number of regions is less than or equal to the threshold.
Attribute
Alarm ID |
Alarm Severity |
Auto Clear |
---|---|---|
19011 |
Major |
Yes |
Parameters
Name |
Meaning |
---|---|
Source |
Specifies the cluster for which the alarm is generated. |
ServiceName |
Specifies the service for which the alarm is generated. |
RoleName |
Specifies the role for which the alarm is generated. |
HostName |
Specifies the host for which the alarm is generated. |
Impact on the System
If the number of RegionServer regions exceeds the threshold, too many Regions increase the load of RegionServer, causing resource bottlenecks such as memory, disk I/O, and CPU. As a result, request response becomes slow or even times out.
Possible Causes
- The RegionServer region distribution is unbalanced.
- The HBase cluster scale is too small.
Procedure
View alarm location information.
- On the FusionInsight Manager home page, choose O&M > Alarm > Alarms, select this alarm, and view the service instance and host name in Location.
- On the FusionInsight Manager home page, choose Cluster > Name of the desired cluster > Services, click the HBase service instance for which the alarm is generated, and click HMaster(Active). On the displayed WebUI of the HBase instance, check whether the region distribution on the RegionServer is balanced.
By default, the admin user does not have the permissions to manage other components. If the page cannot be opened or the displayed content is incomplete when you access the native UI of a component due to insufficient permissions, you can manually create a user with the permissions to manage that component.
Figure 1 WebUI of HBase instance
Enable load balancing.
- Log in to the node where the HBase client is located as user root. Go to the client installation directory, and set environment variables.
cd client installation directory
source bigdata_env
If the cluster adopts the security mode, perform security authentication. Specifically, run the kinit hbase command and enter the password as prompted (obtain the password from the administrator).
- Run the following commands to go to the HBase shell command window and check whether the load balancing function is enabled.
hbase shell
balancer_enabled
- On the HBase shell command window, run the following commands to enable the load balancing function and check whether the function is enabled.
balance_switch true
balancer_enabled
- On the HBase shell command window, run the balancer command to manually trigger the load balancing function.
You are advised to enable and manually trigger the load balancing function during off-peak hours.
- On the FusionInsight Manager home page, choose Cluster > Name of the desired cluster > Services > HBase, and click HMaster(Active). On the displayed WebUI of the HBase instance, refresh the page and check whether the region distribution is balanced.
- Check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 9.
Delete unwanted HBase tables.
Exercise caution when deleting data to ensure data is deleted correctly.
- On the FusionInsight Manager home page, choose Cluster > Name of the desired cluster > Services > HBase, and click HMaster(Active). On the displayed WebUI of the HBase instance, view tables stored in the HBase service instance and record unwanted tables that can be deleted.
- On the HBase shell command window, run the disable command and drop command to delete the table to decrease the number of regions.
disable 'name of the table to be deleted'
drop 'name of the table to be deleted'
- On the HBase shell command window, run the following command to check whether the load balancing function is enabled.
- On the HBase shell command window, run the following commands to enable the load balancing function and confirm that the function is enabled.
balance_switch true
balancer_enabled
- On the HBase shell command window, run the balancer command to manually trigger the load balancing function.
- On the FusionInsight Manager home page, choose Cluster > Name of the desired cluster > Services > HBase, and click HMaster(Active). On the displayed WebUI of the HBase instance, refresh the page and check whether the region distribution is balanced.
- Check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 16.
Adjust the threshold.
- On the FusionInsight Manager home page, choose O&M > Alarm > Thresholds > Name of the desired cluster > HBase > Regions(RegionServer), select the applied rule, and click Modify to check whether the threshold is proper.
- If it is excessively small, increase the threshold as required and go to 17.
- If it is proper, go to 18.
Figure 2 Regions(RegionServer_1)
- Check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 18.
Perform system capacity expansion.
- Add nodes to the HBase cluster and add RegionServer instances to the nodes. Then enable and manually trigger the load balancing function.
- On the FusionInsight Manager home page, choose Cluster > Name of the desired cluster > Services, click the HBase service instance for which the alarm is generated, and click HMaster(Active). On the displayed WebUI of the HBase instance, refresh the page and check whether the region distribution is balanced.
- Check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 21.
Collect fault information.
- On the FusionInsight Manager home page of the active and standby clusters, choose O&M> Log > Download.
- Select HBase in the required cluster from the Service.
- Click in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact the O&M personnel and send the collected logs.
Alarm Clearing
After the fault is rectified, the system automatically clears this alarm.
Related Information
None
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot