为什么给HBase使用的HDFS目录设置quota会造成HBase故障
问题现象
为什么给HDFS上的HBase使用的目录设置quota会造成HBase故障?
原因分析
表的flush操作是在HDFS中写MemStore数据。
如果HDFS目录没有足够的磁盘空间quota,flush操作会失败,使得Region Server将会终止。
Caused by: org.apache.hadoop.hdfs.protocol.DSQuotaExceededException: The DiskSpace quota of /hbase/data/<namespace>/<tableName> is exceeded: quota = 1024 B = 1 KB but diskspace consumed = 402655638 B = 384.00 MB ?at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyStoragespaceQuota(DirectoryWithQuotaFeature.java:211) ?at org.apache.hadoop.hdfs.server.namenode.DirectoryWithQuotaFeature.verifyQuota(DirectoryWithQuotaFeature.java:239) ?at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyQuota(FSDirectory.java:882) ?at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:711) ?at org.apache.hadoop.hdfs.server.namenode.FSDirectory.updateCount(FSDirectory.java:670) ?at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addBlock(FSDirectory.java:495)
上述异常中,表“/hbase/data/<namespace>/<tableName>”的磁盘空间quota值为1KB,但是MemStore数据为384.00MB,所以flush操作失败并且Region Server会终止。
在Region Server终止时,HMaster对终止的Region Server的WAL文件进行replay操作以恢复数据。由于限制了磁盘空间quota值,导致WAL文件的replay操作失败进而导致HMaster进程异常退出。
2016-07-28 19:11:40,352 | FATAL | MASTER_SERVER_OPERATIONS-10-91-9-131:16000-0 | Caught throwable while processing event M_SERVER_SHUTDOWN | org.apache.hadoop.hbase.master.HMaster.abort(HMaster.java:2474) java.io.IOException: failed log splitting for 10-91-9-131,16020,1469689987884, will retry ?at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.resubmit(ServerShutdownHandler.java:365) ?at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:220) ?at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129) ?at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ?at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ?at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: error or interrupted while splitting logs in [hdfs://hacluster/hbase/WALs/<RS-Hostname>,<RS-Port>,<startcode>-splitting] Task = installed = 6 done = 3 error = 3 ?at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290) ?at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:402) ?at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:375)
因此,不支持用户对HDFS上的HBase目录进行quota值设置。
处理步骤
- 在客户端命令提示符下运行以下命令,使HBase用户获得安全认证:
kinit 用户名
- 运行以下命令检查分配的磁盘空间quota:
hdfs dfs -count -q /hbase/data/<namespace>/<tableName> - 使用下列命令取消quota值限制,恢复HBase:
hdfs dfsadmin -clrSpaceQuota /hbase/data/<namespace>/<tableName>