Updated on 2024-11-29 GMT+08:00

Optimization for HBase Overload

Scenario

When the HBase service peaks suddenly and a large number of requests are sent to a RegionServer/HMaster in a short period of time, the RegionServer/HMaster is overloaded. If the HBase service is overloaded, the read and write performance of the application deteriorates, GC occurs frequently on the HBase service, and even the service instance restarts.

Currently, HBase can prevent overloading. It can reject oversized requests, protect internal requests, and record improper requests, reducing the impact on HBase services in overload scenarios and ensuring service stability.

Sharp Traffic Increase

When service traffic peaks, for example, the number of requests increases by 10 times, you can perform the following operations to manage the traffic:

  1. Log in to FusionInsight Manager, choose Cluster > Services > HBase > Chart, select Handler in the chart category on the left, and check whether "Number of Active RegionServer Handlers for Processing User Table Requests-All Instances" is used up for a long time. If they are used up, click Configure. The following table lists the RegionServer parameters to be configured.

    Table 1 Optimizing parameters when RegionServer handlers are used up

    Parameter

    Description

    Optimization

    hbase.regionserver.handler.count

    Number of RPC server instances started on RegionServer

    Increase the value of this parameter. However, the value should be less than or equal to 1000.

    hbase.ipc.server.max.default.callqueue.size.ratio

    Maximum percentage of common requests in the RegionServer queue. When the total size of common requests in the queue exceeds the threshold, the requests are discarded.

    Adjust the value to about 0.8 to limit the proportion of queues occupied by external requests and protect internal requests.

  2. Check whether "XXX is too large for table XXX" or "Client scan caching XXX is too large for table XXX" exists in the service run logs on the application side. If yes, improper requests exist. Check the requests and reduce the data volume of each request (reduce the data volume for Put/Delete batch requests and decrease the Caching value for Scan). If services on the service side cannot be optimized temporarily, you can add or modify the following parameters in the Client installation directory/HBase/hbase/conf/hbase-site.xml file on the application side. (This only reduces recorded alarm logs but does not relieve overload.)

    Table 2 Parameters for reducing recorded alarm logs

    Parameter

    Description

    Optimization

    hbase.rpc.rows.warning.threshold

    Threshold of the number of data records written, updated, or deleted by the HBase client at a time. If the threshold is exceeded, a log is recorded.

    Increase the value of this parameter.

    hbase.client.scanner.warning.threshold.scanning.ratio

    If the caching of a single scan on the HBase client is too large (40% of the maximum value by default), a log is recorded when the threshold is exceeded.

    Change the value of this parameter to 1.0.

  3. If the service side sends too many oversized requests, the server processes the requests slowly. As a result, the requests are stacked and overloaded. If oversized requests can be considered as abnormal requests, adjust the parameters in the HBase configuration on FusionInsight Manager to reject the requests. The following table lists the RegionServer parameters to be configured.

    Table 3 Parameters for rejecting requests

    Parameter

    Description

    Optimization

    hbase.ipc.max.request.size

    Maximum size of a RegionServer request. If a request is bigger than the specified size, the request is discarded. The default value is 256 MB.

    If the application retried for multiple times and "RPC data length XXX of received from XXX is greater than max allowed" is displayed in RegionServer logs, reduce the amount of data sent at a time on the application side. If the amount cannot be reduced, you can increase the value of this parameter. It is recommended that the value be less than or equal to 1 GB.

    hbase.server.keyvalue.maxsize

    Maximum size of a single cell for RegionServer write/update operations. If the value of this parameter is exceeded, RegionServer write/update operations are not allowed. The default value is 10 MB.

    If a single cell is too large, the read and write performance is degraded and abnormal data may exist. You can evaluate the data range based on the written data and set the upper limit. If the evaluation cannot be performed, you are advised to retain the default value.

    hbase.rpc.rows.size.threshold.reject

    Whether to reject a RegionServer request when the number of data operations in the request exceeds the specified limit.

    If there is a request contains a large number of write, update, and delete operations on a node, the number of operations may exceed the value of hbase.rpc.rows.warning.threshold. In this case, overloading occurs and the performance deteriorates. If this parameter is set to true, large requests will be rejected. If the pre-partitioning is improper, too many requests may be rejected. Set this parameter to true only when stable.

Server Restart in a Large Number of Regions

When multiple RegionServers of large-scale clusters in a number of regions (more than 100,000) are restarted at the same time, HMaster may be overloaded.

You can configure the parameters listed in Table 4 in the HBase configuration on FusionInsight Manager to accelerate HMaster processing of high-priority requests and reduce HMaster overload.
Table 4 Parameters for handling overloading caused by online/offline switches in a large number of regions

Instance Name

Parameter

Description

Optimization

HMaster

hbase.regionserver.metahandler.count

Number of handlers used by HMaster to process high-priority requests

Increase the value of this parameter. However, the value should be less than or equal to 1000.

hbase.ipc.server.metacallqueue.read.ratio

Ratio of read queues in a high-priority request queue, which affects the number of meta read/write handlers

Retain the default value 0.5.

RegionServer

hbase.regionserver.msginterval

Interval for transmitting messages between RegionServer and HMaster

Increase the value of this parameter can release the pressure on HMaster. The recommended value is 15s.