Help Center/ MapReduce Service/ Component Operation Guide (Normal)/ Using HBase/ HBase Troubleshooting/ HMaster Failed to Be Started After the OfflineMetaRepair Tool Is Used to Rebuild Metadata
Updated on 2024-12-11 GMT+08:00

HMaster Failed to Be Started After the OfflineMetaRepair Tool Is Used to Rebuild Metadata

Question

After the OfflineMetaRepair tool is used to rebuild metadata, the namespace table allocation times out during HMaster startup. Why does the startup fail?

HMaster outputs the following FATAL message to indicate that the operation is terminated:

2017-06-15 15:11:07,582 FATAL [Hostname:16000.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown.
java.io.IOException: Timedout 120000ms waiting for namespace table to be assigned
        at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:98)
        at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:1054)
        at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:848)
        at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:199)
        at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1871)
        at java.lang.Thread.run(Thread.java:745)

Answer

During startup, HMaster waits for WAL splitting on all region servers to ensure metadata consistency before rebuilding metadata with OfflineMetaRepair. When WAL splitting is complete, HMaster allocates user regions. However, this process can be slow if the cluster is abnormal, due to factors like many WALs, slow I/Os, or unstable region servers.

To ensure successful splitting of all region server WALs on on HMaster, perform the following steps:

  1. Ensure that the cluster is stable and no other problems exist. If any problem occurs, rectify it first.
  2. Set the hbase.master.initializationmonitor.timeout parameter to a large value. The default value is 3600000 milliseconds.
  3. Restart the HBase service.