Improving Real-time Data Read Performance
Scenario
HBase data needs to be read.
Prerequisites
The get or scan interface of HBase has been invoked and data is read in real time from HBase.
Procedure
- Data reading server tuning
Parameter portal:
Go to the All Configurations page of the HBase service. For details, see Modifying Cluster Service Configuration Parameters.
If read and write operations are performed at the same time, the performance of the two operations affects each other. If flush and compaction operations are frequently performed due to data writes, a large number of disk I/O operations are occupied, affecting read performance. If a large number of compaction operations are blocked due to write operations, multiple HFiles exist in the region, affecting read performance. Therefore, if the read performance is unsatisfactory, you need to check whether the write configurations are proper.
- Data reading client tuning
When scanning data, you need to set caching (the number of records read from the server at a time. The default value is 1.). If the default value is used, the read performance will be extremely low.
If you do not need to read all columns of a piece of data, specify the columns to be read to reduce network I/O.
If you only need to read the row key, add a filter (FirstKeyOnlyFilter or KeyOnlyFilter) that only reads the row key.
- Data table reading design optimization
Table 2 Parameters affecting real-time data reading Parameter
Description
Default Value
COMPRESSION
The compression algorithm compresses blocks in HFiles. For compressible data, configure the compression algorithm to efficiently reduce disk I/Os and improve performance.
NOTE:Some data cannot be efficiently compressed. For example, a compressed figure can hardly be compressed again. The common compression algorithm is SNAPPY, because it has a high encoding/decoding speed and acceptable compression rate.
NONE
BLOCKSIZE
Different block sizes affect HBase data read and write performance. You can configure sizes for blocks in an HFile. Larger blocks have a higher compression rate. However, they have poor performance in random data read, because HBase reads data in a unit of blocks.
Set the parameter to 128 KB or 256 KB to improve data write efficiency without greatly affecting random read performance. The unit is byte.
65536
DATA_BLOCK_ENCODING
Encoding method of the block in an HFile. If a row contains multiple columns, set FAST_DIFF to save data storage space and improve performance.
NONE
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.