HMaster Failed to Be Started After the OfflineMetaRepair Tool Is Used to Rebuild Metadata
Question
After the OfflineMetaRepair tool is used to rebuild metadata, the namespace table allocation times out during HMaster startup. Why does the startup fail?
HMaster outputs the following FATAL message to indicate that the operation is terminated:
2017-06-15 15:11:07,582 FATAL [Hostname:16000.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: Timedout 120000ms waiting for namespace table to be assigned at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:98) at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1054) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848) at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:199) at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1871) at java.lang.Thread.run(Thread.java:745)
Answer
During startup, HMaster waits for WAL splitting on all region servers to ensure metadata consistency before rebuilding metadata with OfflineMetaRepair. When WAL splitting is complete, HMaster allocates user regions. However, this process can be slow if the cluster is abnormal, due to factors like many WALs, slow I/Os, or unstable region servers.
To ensure successful splitting of all region server WALs on on HMaster, perform the following steps:
- Ensure that the cluster is stable and no other problems exist. If any problem occurs, rectify it first.
- Set the hbase.master.initializationmonitor.timeout parameter to a large value. The default value is 3600000 milliseconds.
- Restart the HBase service.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.