HBase Failed to Start After the Cluster Is Powered Off and On
Symptom
After the ECS in the cluster is stopped and restarted, HBase fails to start.
Cause Analysis
Check the HMaster run logs. A large number of errors are reported, as shown below:
2018-03-26 11:10:54,185 | INFO | hadoopc1h3,21300,1522031630949_splitLogManager__ChoreService_1 | total tasks = 1 unassigned = 0 tasks={/hbase/splitWAL/WALs%2Fhadoopc1h1%2C213 02%2C1520214023667-splitting%2Fhadoopc1h1%252C21302%252C1520214023667.default.1520584926990=last_update = 1522033841041 last_version = 34255 cur_worker_name = hadoopc1h3,21302, 1520943011826 status = in_progress incarnation = 3 resubmits = 3 batch = installed = 1 done = 0 error = 0} | org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor.chore (SplitLogManager.java:745) 2018-03-26 11:11:00,185 | INFO | hadoopc1h3,21300,1522031630949_splitLogManager__ChoreService_1 | total tasks = 1 unassigned = 0 tasks={/hbase/splitWAL/WALs%2Fhadoopc1h1%2C213 02%2C1520214023667-splitting%2Fhadoopc1h1%252C21302%252C1520214023667.default.1520584926990=last_update = 1522033841041 last_version = 34255 cur_worker_name = hadoopc1h3,21302, 1520943011826 status = in_progress incarnation = 3 resubmits = 3 batch = installed = 1 done = 0 error = 0} | org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor.chore (SplitLogManager.java:745) 2018-03-26 11:11:06,185 | INFO | hadoopc1h3,21300,1522031630949_splitLogManager__ChoreService_1 | total tasks = 1 unassigned = 0 tasks={/hbase/splitWAL/WALs%2Fhadoopc1h1%2C213 02%2C1520214023667-splitting%2Fhadoopc1h1%252C21302%252C1520214023667.default.1520584926990=last_update = 1522033841041 last_version = 34255 cur_worker_name = hadoopc1h3,21302, 1520943011826 status = in_progress incarnation = 3 resubmits = 3 batch = installed = 1 done = 0 error = 0} | org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor.chore (SplitLogManager.java:745) 2018-03-26 11:11:10,787 | INFO | RpcServer.reader=9,bindAddress=hadoopc1h3,port=21300 | Kerberos principal name is hbase/hadoop.hadoop.com@HADOOP.COM | org.apache.hadoop.hbase .ipc.RpcServer$Connection.readPreamble(RpcServer.java:1532) 2018-03-26 11:11:12,185 | INFO | hadoopc1h3,21300,1522031630949_splitLogManager__ChoreService_1 | total tasks = 1 unassigned = 0 tasks={/hbase/splitWAL/WALs%2Fhadoopc1h1%2C213 02%2C1520214023667-splitting%2Fhadoopc1h1%252C21302%252C1520214023667.default.1520584926990=last_update = 1522033841041 last_version = 34255 cur_worker_name = hadoopc1h3,21302, 1520943011826 status = in_progress incarnation = 3 resubmits = 3 batch = installed = 1 done = 0 error = 0} | org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor.chore (SplitLogManager.java:745) 2018-03-26 11:11:18,185 | INFO | hadoopc1h3,21300,1522031630949_splitLogManager__ChoreService_1 | total tasks = 1 unassigned = 0 tasks={/hbase/splitWAL/WALs%2Fhadoopc1h1%2C213 02%2C1520214023667-splitting%2Fhadoopc1h1%252C21302%252C1520214023667.default.1520584926990=last_update = 1522033841041 last_version = 34255 cur_worker_name = hadoopc1h3,21302, 1520943011826 status = in_progress incarnation = 3 resubmits = 3 batch = installed = 1 done = 0 error = 0} | org.apache.hadoop.hbase.master.SplitLogManager$TimeoutMonitor.chore (SplitLogManager.java:745)
The WAL splitting of RegionServer fails when the node is powered on and off.
Solution
- Stop HBase.
- Run the hdfs fsck command to check the health status of the /hbase/WALs file.
hdfs fsck /hbase/WALs
If the following command output is displayed, all files are normal. If any file is abnormal, rectify the fault, and then perform the subsequent operations.
The filesystem under path '/hbase/WALs' is HEALTHY
- Back up the /hbase/WALs file.
hdfs dfs -mv /hbase/WALs /hbase/WALs_old
- Run the following command to create the /hbase/WALs directory.
hdfs dfs -mkdir /hbase/WALs
Make sure that the permission on the directory is hbase:hadoop.
- Start HBase.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.