Optimization for HBase Overload
Scenario
When the HBase service peaks suddenly and a large number of requests are sent to a RegionServer/HMaster in a short period of time, the RegionServer/HMaster is overloaded. If the HBase service is overloaded, the read and write performance of the application deteriorates, GC occurs frequently on the HBase service, and even the service instance restarts.
Currently, HBase can prevent overloading. It can reject oversized requests, protect internal requests, and record improper requests, reducing the impact on HBase services in overload scenarios and ensuring service stability.
This topic is available for MRS 3.3.0 and later versions only.
Sharp Traffic Increase
When service traffic peaks, for example, the number of requests increases by 10 times, you can perform the following operations to manage the traffic:
- Log in to FusionInsight Manager, choose Cluster > Services > HBase > Chart, select Handler in the chart category on the left, and check whether "Number of Active RegionServer Handlers for Processing User Table Requests-All Instances" is used up for a long time. If they are used up, click Configure. The following table lists the RegionServer parameters to be configured.
Table 1 Optimizing parameters when RegionServer handlers are used up Parameter
Description
Optimization
hbase.regionserver.handler.count
Number of RPC server instances started on RegionServer
Increase the value of this parameter. However, the value should be less than or equal to 1000.
hbase.ipc.server.max.default.callqueue.size.ratio
Maximum percentage of common requests in the RegionServer queue. When the total size of common requests in the queue exceeds the threshold, the requests are discarded.
Adjust the value to about 0.8 to limit the proportion of queues occupied by external requests and protect internal requests.
- Check whether "XXX is too large for table XXX" or "Client scan caching XXX is too large for table XXX" exists in the service run logs on the application side. If yes, improper requests exist. Check the requests and reduce the data volume of each request (reduce the data volume for Put/Delete batch requests and decrease the Caching value for Scan). If services on the service side cannot be optimized temporarily, you can add or modify the following parameters in the Client installation directory/HBase/hbase/conf/hbase-site.xml file on the application side. (This only reduces recorded alarm logs but does not relieve overload.)
Table 2 Parameters for reducing recorded alarm logs Parameter
Description
Optimization
hbase.rpc.rows.warning.threshold
Threshold of the number of data records written, updated, or deleted by the HBase client at a time. If the threshold is exceeded, a log is recorded.
Increase the value of this parameter.
hbase.client.scanner.warning.threshold.scanning.ratio
If the caching of a single scan on the HBase client is too large (40% of the maximum value by default), a log is recorded when the threshold is exceeded.
Change the value of this parameter to 1.0.
- If the service side sends too many oversized requests, the server processes the requests slowly. As a result, the requests are stacked and overloaded. If oversized requests can be considered as abnormal requests, adjust the parameters in the HBase configuration on FusionInsight Manager to reject the requests. The following table lists the RegionServer parameters to be configured.
Table 3 Parameters for rejecting requests Parameter
Description
Optimization
hbase.ipc.max.request.size
Maximum size of a RegionServer request. If a request is bigger than the specified size, the request is discarded. The default value is 256 MB.
If the application retried for multiple times and "RPC data length XXX of received from XXX is greater than max allowed" is displayed in RegionServer logs, reduce the amount of data sent at a time on the application side. If the amount cannot be reduced, you can increase the value of this parameter. It is recommended that the value be less than or equal to 1 GB.
hbase.server.keyvalue.maxsize
Maximum size of a single cell for RegionServer write/update operations. If the value of this parameter is exceeded, RegionServer write/update operations are not allowed. The default value is 10 MB.
If a single cell is too large, the read and write performance is degraded and abnormal data may exist. You can evaluate the data range based on the written data and set the upper limit. If the evaluation cannot be performed, you are advised to retain the default value.
hbase.rpc.rows.size.threshold.reject
Whether to reject a RegionServer request when the number of data operations in the request exceeds the specified limit.
If there is a request contains a large number of write, update, and delete operations on a node, the number of operations may exceed the value of hbase.rpc.rows.warning.threshold. In this case, overloading occurs and the performance deteriorates. If this parameter is set to true, large requests will be rejected. If the pre-partitioning is improper, too many requests may be rejected. Set this parameter to true only when stable.
Server Restart in a Large Number of Regions
When multiple RegionServers of large-scale clusters in a number of regions (more than 100,000) are restarted at the same time, HMaster may be overloaded.
Instance Name |
Parameter |
Description |
Optimization |
---|---|---|---|
HMaster |
hbase.regionserver.metahandler.count |
Number of handlers used by HMaster to process high-priority requests |
Increase the value of this parameter. However, the value should be less than or equal to 1000. |
hbase.ipc.server.metacallqueue.read.ratio |
Ratio of read queues in a high-priority request queue, which affects the number of meta read/write handlers |
Retain the default value 0.5. |
|
RegionServer |
hbase.regionserver.msginterval |
Interval for transmitting messages between RegionServer and HMaster |
Increase the value of this parameter can release the pressure on HMaster. The recommended value is 15s. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot