Help Center > > User Guide> FusionInsight Manager Operation Guide (Applicable to 3.x)> Alarm Reference (Applicable to MRS 3.x)> ALM-19019 Number of HBase HFiles to Be Synchronized Exceeds the Threshold

ALM-19019 Number of HBase HFiles to Be Synchronized Exceeds the Threshold

Updated at: Sep 02, 2021 GMT+08:00

Description

The system checks the number of HFiles to be synchronized by the RegionServer of each HBase service instance every 30 seconds. This indicator can be viewed on the RegionServer role monitoring page. This alarm is generated when the number of HFiles to be synchronized on a RegionServer exceeds the threshold (exceeding 128 for 20 consecutive times by default). To change the threshold, choose O&M > Alarm > Threshold Configuration > Name of the desired cluster > HBase . This alarm is cleared when the number of HFiles to be synchronized is less than or equal to the threshold.

Attribute

Alarm ID

Alarm Severity

Auto Clear

19019

Major

Yes

Parameters

Name

Meaning

Source

Specifies the cluster for which the alarm is generated.

ServiceName

Specifies the name of the service for which the alarm is generated.

RoleName

Specifies the name of the role for which the alarm is generated.

HostName

Specifies the name of the host for which the alarm is generated.

Trigger Condition

Specifies the threshold for triggering the alarm.

Impact on the System

If the number of HFiles to be synchronized by a RegionServer exceeds the threshold, the number of ZNodes used by HBase exceeds the threshold, affecting the HBase service status.

Possible Causes

  • The network is abnormal.
  • The RegionServer region distribution is unbalanced.
  • The HBase service scale of the standby cluster is too small.

Procedure

View alarm location information.

  1. Log in to FusionInsight Manager, choose O&M > Alarm, select this alarm, and view the service instance and host name in Location.

Check the network connection between RegionServers on active and standby clusters.

  1. Run the ping command to check whether the network connection between the faulty RegionServer node and the host where RegionServer of the standby cluster resides is normal.

    • If yes, go to 5.
    • If no, go to 3.

  2. Contact the network administrator to restore the network.
  3. After the network recovers, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 5.

Check the RegionServer region distribution in the active cluster.

  1. On the FusionInsight Manager portal, choose Cluster > Name of the desired cluster > Services > HBase. Click HMaster(Active) to go to the web UI of the HBase instance and check whether regions are evenly distributed on the Region Server.

  2. Log in to the faulty RegionServer node as user omm.
  3. Run the following commands to go to the client installation directory and set the environment variable:

    cd Client installation directory

    source bigdata_env

    If the cluster uses the security mode, perform security authentication. Run the kinit hbase command and enter the password as prompted (obtain the password from the administrator).

  4. Run the following commands to check whether the load balancing function is enabled.

    hbase shell

    balancer_enabled
    • If yes, go to 10.
    • If no, go to 9.

  5. Run the following commands in HBase Shell to enable the load balancing function and verify that the function is enabled.

    balance_switch true

    balancer_enabled

  6. Run the balancer command to manually trigger the load balancing function.

    You are advised to enable and manually trigger the load balancing function during off-peak hours.

  7. Check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 12.

Check the HBase service scale of the standby cluster.

  1. Expand the HBase cluster, add a node, and add a RegionServer instance on the node. Then, perform 6 to 10 to enable the load balancing function and manually trigger it.
  2. On the FusionInsight Manager portal, choose Cluster > Name of the desired cluster > Services > HBase. Click HMaster(Active) to go to the web UI of the HBase instance, refresh the page, and check whether regions are evenly distributed.

    • If yes, go to 14.
    • If no, go to 15.

  3. Check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 15.

Collect the fault information.

  1. On FusionInsight Manager of the active and standby clusters, choose O&M > Log > Download.
  2. Expand the Service drop-down list, and select HBase for the target cluster.
  3. Click in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
  4. Contact O&M personnel and send the collected logs.

Alarm Clearing

After the fault that triggers the alarm is rectified, the alarm is automatically cleared.

Related Information

None

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel