HBase Result Table

Function

DLI outputs the job data to HBase. HBase is a column-oriented distributed cloud storage system that features enhanced reliability, excellent performance, and elastic scalability. It applies to the storage of massive amounts of data and distributed computing. You can use HBase to build a storage system capable of storing TB- or even PB-level data. With HBase, you can filter and analyze data with ease and get responses in milliseconds, rapidly mining data value. Structured and semi-structured key-value data can be stored, including messages, reports, recommendation data, risk control data, logs, and orders. With DLI, you can write massive volumes of data to HBase at a high speed and with low latency.

Prerequisites

An enhanced datasource connection has been created for DLI to connect to HBase, so that jobs can run on the dedicated queue of DLI and you can set the security group rules as required.

If MRS HBase is used, IP addresses of all hosts in the MRS cluster have been added to host information of the enhanced datasource connection.
For details, see Modifying the Host Information in the Data Lake Insight User Guide.
You have set up an enhanced datasource connection. For details, see Enhanced Datasource Connections in the Data Lake Insight User Guide.
For details about how to configure security group rules, see Security Group Overview in the Virtual Private Cloud User Guide.

Syntax

create table hbaseSink (
  attr_name attr_type 
  (',' attr_name attr_type)* 
)
with (
  'connector.type' = 'hbase',
  'connector.version' = '1.4.3',
  'connector.table-name' = '',
  'connector.zookeeper.quorum' = ''
);

Parameters

**Table 1** Parameter description
Parameter	Mandatory	Description
connector.type	Yes	Connector type. Set this parameter to hbase.
connector.version	Yes	The value must be 1.4.3.
connector.table-name	Yes	HBase table name
connector.zookeeper.quorum	Yes	ZooKeeper address
connector.zookeeper.znode.parent	No	Root directory for ZooKeeper. The default value is /hbase.
connector.write.buffer-flush.max-size	No	Maximum buffer size for each data write. The default value is 2 MB. The unit is MB.
connector.write.buffer-flush.max-rows	No	Maximum number of data records that can be updated each time
connector.write.buffer-flush.interval	No	Update time. The default value is 0s. Example value: 2s.
connector.rowkey	No	Content of a compound rowkey to be assigned. The content is assigned to a new field based on the configuration. Example: rowkey1:3,rowkey2:3, ... The value 3 indicates the first three bytes of the field. The number cannot be greater than the byte size of the field and cannot be less than 1.

Example

 create table hbaseSink(
  rowkey string,
  name string,
  i Row<geneder string, age int>,
  j Row<address string>
 ) with (
   'connector.type' = 'hbase',
   'connector.version' = '1.4.3',
   'connector.table-name' = 'sink',
   'connector.rowkey' = 'rowkey:1,name:3',
   'connector.write.buffer-flush.max-rows' = '5',
   'connector.zookeeper.quorum' = 'xxxx:2181'
 );

Parent topic: Creating a Result Table

Previous topic: SMN Result Table

Next topic: Elasticsearch Result Table