更新时间:2024-04-29 GMT+08:00

使用Scan读取数据

功能简介

要从表中读取数据,首先需要实例化该表对应的Table实例,然后创建一个Scan对象,并针对查询条件设置Scan对象的参数值,为了提高查询效率,最好指定StartRow和StopRow。查询结果的多行数据保存在ResultScanner对象中,每行数据以Result对象形式存储,Result中存储了多个Cell。

代码样例

public void testScanData() {
  LOG.info("Entering testScanData.");
  Table table = null; 
  // Instantiate a ResultScanner object.
  ResultScanner rScanner = null;
  try {
    // Create the Configuration instance.
    table = conn.getTable(tableName);
    // Instantiate a Get object.
    Scan scan = new Scan();
    scan.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"));
    // Set the cache size.
    scan.setCaching(1000);
    // Submit a scan request.
    rScanner = table.getScanner(scan);
    // Print query results.
    for (Result r = rScanner.next(); r != null; r = rScanner.next()) {
      for (Cell cell : r.rawCells()) {
        LOG.info(Bytes.toString(CellUtil.cloneRow(cell)) + ":"
            + Bytes.toString(CellUtil.cloneFamily(cell)) + ","
            + Bytes.toString(CellUtil.cloneQualifier(cell)) + ","
            + Bytes.toString(CellUtil.cloneValue(cell)));
      }
    }
    LOG.info("Scan data successfully.");
  } catch (IOException e) {
    LOG.error("Scan data failed " ,e);
  } finally {
    if (rScanner != null) {
      // Close the scanner object.
      rScanner.close();
    }
    if (table != null) {
      try {
        // Close the HTable object.
        table.close();
      } catch (IOException e) {
        LOG.error("Close table failed " ,e);
      }
    }
  }
  LOG.info("Exiting testScanData.");
}     

注意事项

  1. 建议Scan时指定StartRow和StopRow,一个有确切范围的Scan,性能会更好些。
  2. 可以设置Batch和Caching关键参数。
    • Batch

      使用Scan调用next接口每次最大返回的记录数,与一次读取的列数有关。

    • Caching

      RPC请求返回next记录的最大数量,该参数与一次RPC获取的行数有关。