ALM-19013 Duration of Regions in transaction State Exceeds the Threshold
Alarm Description
The system checks the number of regions in transaction state on HBase every 300 seconds. This alarm is generated when the system detects that the duration of regions in transaction state exceeds the threshold for two consecutive times. This alarm is cleared when all timeout regions are restored.
Alarm Attributes
Alarm ID |
Alarm Severity |
Alarm Type |
Service Type |
Auto Cleared |
---|---|---|---|---|
19013 |
Major |
Quality of service |
HBase |
Yes |
Alarm Parameters
Type |
Parameter |
Description |
---|---|---|
Location Information |
Source |
Specifies the cluster for which the alarm is generated. |
ServiceName |
Specifies the service for which the alarm is generated. |
|
RoleName |
Specifies the role for which the alarm is generated. |
|
HostName |
Specifies the host for which the alarm is generated. |
Impact on the System
Some data in the service table gets lost or becomes unavailable.
Possible Causes
- Compaction is permanently blocked.
- The HDFS files are abnormal.
Handling Procedure
Locate the alarm cause.
- On the FusionInsight Manager, choose O&M > Alarm > Alarms, select this alarm, and view the HostName and RoleName in Location.
- Choose Cluster > Name of the desired cluster > Services > HBase, Click the drop-down menu in the chartarea and choose Customize > Service >
- Choose Cluster > Name of the desired cluster > Services > HBase > HMaster (Active) > Tables to check whether the regions of only one table transaction status time out.
- Run the hbase hbck command on the client and check whether the error message "No table descriptor file under hdfs://hacluster/hbase/data/default/table" is displayed.
- Log in to the client as user root. Run the following command:
cd client installation directory
source bigdata_env
If the cluster is in security mode, run the kinit hbase command
Log in to the HMaster WebUI, choose Procedure & Locks in the navigation tree, and check whether any process ID is in the Waiting state in Procedures. If yes, run the following command to release the procedure lock:
hbase hbck -j client installation directory/HBase/hbase/tools/hbase-hbck2-*.jar bypass -o pid
Check whether the state is in the Bypass state. If the procedure on the UI is always in RUNNABLE(Bypass) state, perform an active/standby switchover. Run the assigns command to bring the region online again.
hbase hbck -j client installation directory/HBase/hbase/tools/hbase-hbck2-*.jar assigns -o regionName
- Repeat 4. Run the hbase hbck command on the client and check whether the error message "No table descriptor file under hdfs://hacluster/hbase/data/default/table" is displayed.
- If yes, go to 7.
- If no, no further action is required.
Collect fault information.
- On the FusionInsight Manager page of the active and standby clusters, choose O&M > Log > Download.
- In the Service area, select faulty HBase services in the required cluster.
- Click the edit icon in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact the O&M engineers and send the collected logs.
Alarm Clearance
After the fault is rectified, the system automatically clears this alarm.
Related Information
None.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot