Application Scenarios of HBase BulkLoad and Put

Both the BulkLoad and Put methods can be used to load data to HBase. Though BulkLoad loads data faster than Put, BulkLoad has disadvantages. The following describes the application scenarios of these two data loading methods.

BulkLoad starts MapReduce tasks to generate HFile files, and then registers HFile files with HBase. Incorrect use of BulkLoad will consume more cluster memory and CPU resources due to started MapReduce tasks. A large number of the generated small HFile files may frequently trigger Compaction, decreasing query speed dramatically.

Incorrect use of the Put method may cause slow data loading. If the memory allocated to RegionServer is insufficient, the process may exit due to the RegionServer memory overflow.

The application scenarios of the BulkLoad and Put methods are as follows:

BulkLoad:
- Large amounts of data needs to be loaded to HBase in the one-off manner.
- When data is loaded to HBase, requirements on reliability are not high and WAL files do not need to be generated.
- When the Put method is used to load large amounts of data to HBase, data loading and query will be slow.
- The size of an HFile generated after data loading is similar to the size of HDFS blocks.
Put:
- The size of the data loaded to one Region at a time is smaller than half the size of HDFS blocks.
- Data needs to be loaded to HBase in real time.
- The query speed must not decrease dramatically during data loading.

Parent topic: FAQs About HBase Application Development

Previous topic: What Do I Do When There Is an HBase Application Running Exception?

Next topic: HDFS Development Guide

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

For any further questions, feel free to contact us through the chatbot.

Chatbot