Updated on 2022-08-16 GMT+08:00

Appendix

Parameters Batch and Caching for Scan

Batch: specifies the maximum number of data records returned each time when scan calls the next interface. It is related to the number of columns read each time.

Caching: specifies the maximum number of next values returned for an RPC request. It is related to the number of rows obtained by each RPC.

The following examples explain the functions of these two parameters in Scan:

A Region contains two rows (rowkey) of data in table A. Each row has 1000 columns, and each column has only one version, that is, each row has 1000 key values.

-

ColuA1

ColuA2

ColuA3

ColuA4

ColuN1

ColuN2

ColuN3

ColuN4

Row1

-

-

-

-

-

-

-

-

Row2

-

-

-

-

-

-

-

-

  • Example 1: If Batch is not specified and Caching is 2,

    2000 (Key, Value) records will be returned for each RPC request.

  • Example 2: If Batch is set to 500 and Caching is 2,

    1000 (Key, Value) records will be returned for each RPC request.

  • Example 3: If Batch is set to 300 and Caching is 4,

    1000 (Key, Value) records will be returned for each RPC request.

Further explanation of Batch and Caching

  • Each Caching indicates a chance of data request.
  • The value of Batch determines whether a row of data can be read in a Caching. If the value of Batch is smaller than the total columns in a row, this row of data can be read in at least two Caching operations (the next Caching starts from the data where the previous caching stops).
  • Each Caching cannot cross rows. That is, if the value of Batch is not reached after a row of data is read, data of the next row will not be read.

This can further explain the results of the previous examples.

  • Example 1:

    Since Batch is not set, all columns of that row will be read by default. As Caching is 2, 2000 (Key, Value) records will be returned for each RPC request.

  • Example 2:

    Because Batch is 500 and Caching is 2, a maximum of 500 columns of data will be read in each Caching. Therefore, 1000 (Key, Value) records will be returned after two times of caching.

  • Example 3:

    Because Batch is 300 and Caching is 4, four times of caching are required to read 1000 data records. Therefore, only 1000 (Key, Value) records will be returned.

Code example:

Scan s = new Scan();
//Set the start and end keys for data query.
s.setStartRow(Bytes.toBytes("01001686138100001"));
s.setStopRow(Bytes.toBytes("01001686138100002"));
s.setBatch(1000);
s.setCaching(100);
ResultScanner scanner = null;
try {
scanner = tb.getScanner(s);
for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
for (KeyValue kv : rr.raw()) {
//Display the query results.
System.out.println("key:" + Bytes.toString(kv.getRow())
+ "getQualifier:" + Bytes.toString(kv.getQualifier())
+ "value" + Bytes.toString(kv.getValue()));
}
}
} catch (IOException e) {
System.out.println("error!" + e.toString());
} finally {
scanner.close();
}