Reading Data Using Scan
Function Description
Before reading data from a table, instantiate the Table instance of the table, and then create a Scan object and set parameters for the Scan object based on search criteria. To improve query efficiency, you are advised to specify StartRow and StopRow. Query results are stored in the ResultScanner object, where each row of data is stored as a Result object that stores multiple Cells.
Sample Code
The following code snippets are in the testScanData method in the HBaseSample class of the com.huawei.bigdata.hbase.examples packet.
public void testScanData() { LOG.info("Entering testScanData."); Table table = null; // Instantiate a ResultScanner object. ResultScanner rScanner = null; try { // Create the Configuration instance. table = conn.getTable(tableName); // Instantiate a Get object. Scan scan = new Scan(); scan.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name")); // Set the StartRow scan.setStartRow(Bytes.toBytes("012005000202"));//Note [1] // Set the StopRow scan.setStopRow(Bytes.toBytes("012005000210"));//Note [1] // Set the Caching size. scan.setCaching(1000);//Note [2] // Set the Batch size. scan.setBatch(100);//Note [2] // Submit a scan request. rScanner = table.getScanner(scan); // Print query results. for (Result r = rScanner.next(); r != null; r = rScanner.next()) { for (Cell cell : r.rawCells()) { LOG.info(Bytes.toString(CellUtil.cloneRow(cell)) + ":" + Bytes.toString(CellUtil.cloneFamily(cell)) + "," + Bytes.toString(CellUtil.cloneQualifier(cell)) + "," + Bytes.toString(CellUtil.cloneValue(cell))); } } LOG.info("Scan data successfully."); } catch (IOException e) { LOG.error("Scan data failed ", e); } finally { if (rScanner != null) { // Close the scanner object. rScanner.close(); } if (table != null) { try { // Close the HTable object. table.close(); } catch (IOException e) { LOG.error("Close table failed ", e); } } } LOG.info("Exiting testScanData."); }
Precautions
- You are advised to specify StartRow and StopRow to ensure good performance with a specified Scan scope.
- You can set Batch and Caching.
- Batch
Batch indicates the maximum number of records returned each time when the next API is invoked using Scan. This parameter is related to the number of columns read each time.
- Caching
Caching indicates the maximum number of next records returned for a remote procedure call (RPC) request. This parameter is related to the number of rows read by an RPC.
- Batch
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.