Help Center/ MapReduce Service/ Component Operation Guide (Normal)/ Using HBase/ Common Issues About HBase/ Why Does HMaster Exits Due to Timeout When Waiting for the Namespace Table to Go Online?
Updated on 2022-09-15 GMT+08:00

Why Does HMaster Exits Due to Timeout When Waiting for the Namespace Table to Go Online?

Question

Why does HMaster exit due to timeout when waiting for the namespace table to go online?

Answer

During the HMaster active/standby switchover or startup, HMaster performs WAL splitting and region recovery for the RegionServer that failed or was stopped previously.

Multiple threads are running in the background to monitor the HMaster startup process.

  • TableNamespaceManager

    This is a help class, which is used to manage the allocation of namespace tables and monitoring table regions during HMaster active/standby switchover or startup. If the namespace table is not online within the specified time (hbase.master.namespace.init.timeout, which is 3,600,000 ms by default), the thread terminates HMaster abnormally.

  • InitializationMonitor

    This is an initialization thread monitoring class of the primary HMaster, which is used to monitor the initialization of the primary HMaster. If a thread fails to be initialized within the specified time (hbase.master.initializationmonitor.timeout, which is 3,600,000 ms by default), the thread terminates HMaster abnormally. If hbase.master.initializationmonitor.haltontimeout is started, the default value is false.

During the HMaster active/standby switchover or startup, if the WAL hlog file exists, the WAL splitting task is initialized. If the WAL hlog splitting task is complete, it initializes the table region allocation task.

HMaster uses ZooKeeper to coordinate log splitting tasks and valid RegionServers and track task development. If the primary HMaster exits during the log splitting task, the new primary HMaster attempts to resend the unfinished task, and RegionServer starts the log splitting task from the beginning.

The initialization of the HMaster is delayed due to the following reasons:

  • Network faults occur intermittently.
  • Disks run into bottlenecks.
  • The log splitting task is overloaded, and RegionServer runs slowly.
  • RegionServer (region opening) responds slowly.

In the preceding scenarios, you are advised to add the following configuration parameters to enable HMaster to complete the restoration task earlier. Otherwise, the Master will exit, causing a longer delay of the entire restoration process.

  • Increase the online waiting timeout period of the namespace table to ensure that the Master has enough time to coordinate the splitting tasks of the RegionServer worker and avoid repeated tasks.

    hbase.master.namespace.init.timeout (default value: 3,600,000 ms)

  • Increase the number of concurrent splitting tasks through RegionServer worker to ensure that RegionServer worker can process splitting tasks in parallel (RegionServers need more cores). Add the following parameters to Client installation path /HBase/hbase/conf/hbase-site.xml:

    hbase.regionserver.wal.max.splitters (default value: 2)

  • If all restoration processes require time, increase the timeout period for initializing the monitoring thread.

    hbase.master.initializationmonitor.timeout (default value: 3,600,000 ms)