Updated on 2022-09-14 GMT+08:00

Appendix

Parameters Batch and Caching for Scan

Batch: Indicates the maximum number of records returned each time when the next API is invoked using Scan. This parameter is related to the number of columns read each time.

Caching: Indicates the maximum number of next records returned for a remote procedure call (RPC) request. This parameter is related to the number of rows read by an RPC.

The following examples explain the functions of these two parameters in Scan.

Assume that a Region contains two rows (rowkey) of data in table A. Each row has 1000 columns, and each column has only one version, that is, each row has 1000 key values.

  • Example 1: If Batch is not specified and Caching is 2,

    2000 KeyValue records will be returned for each RPC request.

  • Example 2: If Batch is set to 500 and Caching is 2,

    only 1000 KeyValue records will be returned for each RPC request.

  • Example 3: If Batch is set to 300 and Caching is 4,

    only 1000 KeyValue records will be returned for each RPC request.

Further explanation of Batch and Caching

  • Each Caching indicates a chance of data request.
  • The value of Batch determines whether a row of data can be read in a Caching. If the value of Batch is smaller than the total columns in a row, this row of data can be read in at least two Caching operations (the next Caching starts from the data where the previous caching stops).
  • Each Caching cannot cross rows. That is, if the value of Batch is not reached after a row of data is read, data of the next row will not be read.

    This can further explain the results of the previous examples.

  • Example 1:

    Since Batch is not set, all columns of that row will be read by default. As Caching is 2, 2000 KeyValue records will be returned for each RPC request.

  • Example 2:

    Because Batch is 500 and Caching is 2, a maximum of 500 columns of data will be read in each Caching. Therefore, 1000 KeyValue records will be returned after two times of caching.

  • Example 3:

    Because Batch is 300 and Caching is 4, four times of caching are required to read 1000 data records. Therefore, only 1000 KeyValue records will be returned.

Sample code:

Scan s = new Scan();
//Set the start and end keys for a data query.
s.setStartRow(Bytes.toBytes("01001686138100001"));
s.setStopRow(Bytes.toBytes("01001686138100002"));
s.setBatch(1000);
s.setCaching(100);
ResultScanner scanner = null;
try {
scanner = tb.getScanner(s);
for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
for (KeyValue kv : rr.raw()) {
//Display the query results.
System.out.println("key:" + Bytes.toString(kv.getRow())
+ "getQualifier:" + Bytes.toString(kv.getQualifier())
+ "value" + Bytes.toString(kv.getValue()));
}
}
} catch (IOException e) {
System.out.println("error!" + e.toString());
} finally {
scanner.close();
}