Updated on 2024-04-02 GMT+08:00

Reading Data Using Scan

Function

Before reading data from a table, instantiate the Table instance of the table, create a Scan object, and set parameters for the Scan object based on search criteria. To improve query efficiency, you are advised to specify StartRow and StopRow. Query results are stored in the ResultScanner object where each row of data is stored as a Result object that stores multiple Cells.

Example Code

The following code snippet belongs to the testScanData method in the HBaseSample class of the com.huawei.bigdata.hbase.examples package.

public void testScanData() {
    LOG.info("Entering testScanData.");
    Table table = null; 
    // Instantiate a ResultScanner object.
    ResultScanner rScanner = null;
    try {
      // Create the Configuration instance.
      table = conn.getTable(tableName);
      // Instantiate a Get object.
      Scan scan = new Scan();
      scan.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"));
      // Set the cache size.
      scan.setCaching(1000);

      // Submit a scan request.
      rScanner = table.getScanner(scan);
      // Print query results.
       for (Result r = rScanner.next(); r != null; r = rScanner.next()) {
         for (Cell cell : r.rawCells()) {
           LOG.info("{}:{},{},{}", Bytes.toString(CellUtil.cloneRow(cell)),
               Bytes.toString(CellUtil.cloneFamily(cell)), Bytes.toString(CellUtil.cloneQualifier(cell)),
               Bytes.toString(CellUtil.cloneValue(cell)));
        }
      }
      LOG.info("Scan data successfully.");
    } catch (IOException e) {
      LOG.error("Scan data failed " ,e);
    } finally {
      if (rScanner != null) {
        // Close the scanner object.
        rScanner.close();
      }
      if (table != null) {
        try {
          // Close the HTable object.
          table.close();
        } catch (IOException e) {
          LOG.error("Close table failed " ,e);
        }
      }
    }
    LOG.info("Exiting testScanData.");
  }     

Precautions

  1. You are advised to specify StartRow and StopRow to ensure good performance with a specified Scan scope.
  2. You can set Batch and Caching.
    • Batch

      Indicates the maximum number of records returned each time when the next interface is invoked using Scan. This parameter is related to the number of columns read each time.

    • Caching

      Indicates the maximum number of next records returned for a remote procedure call (RPC) request. This parameter is related to the number of rows read by an RPC.