Help Center/ MapReduce Service/ Component Operation Guide (LTS)/ Using HBase/ HBase Troubleshooting/ How Can I Quickly Recover the Service When HBase Files Are Damaged Due to a Cluster Power-Off?
Updated on 2024-10-09 GMT+08:00

How Can I Quickly Recover the Service When HBase Files Are Damaged Due to a Cluster Power-Off?

Symptom

The StoreFile or WAL files are damaged due to an unexpected cluster power-off. How can I quickly restore the service?

This operation is supported only for MRS 3.3.0 or later.

Cause Analysis

If the StoreFile file is damaged, related regions fail to be brought online and system keeps retry the operation. As a result, the HBase service is abnormal. If the WAL file is damaged, log splitting fails and the system keeps retry the operation. As a result, the service is abnormal. Related regions cannot be brought online and provide services for external systems.

Procedure

The HBase server provides two configuration items to determine whether to skip damaged StoreFile and WAL files. Log in to FusionInsight Manager, choose Cluster > Services > HBase and click Configuration, search for and set the parameters listed in Table 1. The parameters take effect dynamically. Save the configuration, log in to the HBase shell, and run the update_all_config command for the parameters to take effect.

Skipping damaged files may cause data loss. If the following parameters are set to true and damaged StoreFile or WAL file is skipped, ALM-19025 Damaged StoreFile in HBase or ALM-19026 Damaged WAL Files in HBase is reported, rectify the fault by referring to the alarm help.

Table 1 Parameters for skipping damaged files on the HBase server

Parameter

Description

Default Value

hregion.hfile.skip.errors

Whether to skip damaged HBase Files and and move them to the /hbase/autocorrupt or /hbase/MasterData/autocorrupt directory when a region is brought online. You are not advised to enable this parameter in DR scenarios.

false

hbase.hlog.split.skip.errors

Whether to skip damaged WAL files and move them to the /hbase/corrupt directory during log splitting.

false