HBase Source Stream

Overview

Create a source stream to obtain data from HBase of CloudTable as input data of the job. HBase is a column-oriented distributed cloud storage system that features enhanced reliability, excellent performance, and elastic scalability. It applies to the storage of massive amounts of data and distributed computing. You can use HBase to build a storage system capable of storing TB- or even PB-level data. With HBase, you can filter and analyze data with ease and get responses in milliseconds, rapidly mining data value. CS can read data from HBase for filtering, analysis, and data dumping.

CloudTable is a distributed, scalable, and fully-hosted key-value data storage service based on Apache HBase. It provides CS with high-performance random read and write capabilities, which are helpful when applications need to store and query a massive amount of structured data, semi-structured data, and time series data. CloudTable applies to IoT scenarios and storage and query of massive volumes of key-value data. For more information about CloudTable, see the CloudTable Service User Guide.

Syntax

Syntax

CREATE SOURCE STREAM stream_id (attr_name attr_type (',' attr_name attr_type)* )WITH (type = "cloudtable",region = "",cluster_id = "",table_name = "",table_columns = "")(TIMESTAMP BY timeindicator (',' timeindicator)?);timeindicator:PROCTIME '.' PROCTIME| ID '.' ROWTIME

Description

Table 1 Syntax description

Parameter

Mandatory

Description

type

Yes

Data source type. Value CloudTable indicates that the data source is CloudTable.

region

Yes

Region to which CloudTable belongs.

cluster_id

Yes

ID of the cluster to which the data table to be read belongs.

For details about how to view the ID of the CloudTable cluster, see section "Viewing Basic Cluster Information" in the CloudTable Service User Guide.

table_name

Yes

Name of the table where data is to be read. If namespace needs to be specified, the value can be namespace_name:table_name.

table_columns

Yes

Column to be read. The format is rowKey,f1:c1,f1:c2,f2:c1. The number of columns must be the same as the number of attributes specified in the source stream.

timeindicator

No

Timestamp added in the source stream. The value can be Processing Time or Event Time.

NOTE:
  • If this parameter is set to Processing Time, set timeindicator to proctime.proctime.

    In this case, an attribute proctime will be added to the original attribute field. If there are three attributes in the original attribute field, four attributes will be exported after this parameter is set to processing time.

  • If this parameter is set to Event Time, you can select an attribute in the stream as the timestamp that is in the format of attr_name.rowtime, where attr_name indicates the attribute in the stream.
  • This parameter can be simultaneously set to processing time and event time.

Precautions

The attribute type used as the timestamp must be long or timestamp.

Example

Read the car_infos table from HBase of CloudTable.

CREATE SOURCE STREAM car_infos (
  car_id STRING,
  car_owner STRING,
  car_age INT,
  average_speed INT,
  total_miles INT
)
WITH (
  type = "cloudtable",
  region = "cn-north-1" ,
  cluster_id = "209ab1b6-de25-4c48-8e1e-29e09d02de28",
  table_name = "carinfo",
  table_columns = "rowKey,info:owner,info:age,car:speed,car:miles"
);