Updated on 2024-08-19 GMT+08:00

DIS Source Table

Function

Create a source stream to read data from DIS. DIS accesses user data and Flink job reads data from the DIS stream as input data for jobs. Flink jobs can quickly remove data from producers using DIS source sources for continuous processing. Flink jobs are applicable to scenarios where data outside the cloud service is imported to the cloud service for filtering, real-time analysis, monitoring reports, and dumping.

DIS addresses the challenge of transmitting data outside cloud services to cloud services. DIS builds data intake streams for custom applications capable of processing or analyzing streaming data. DIS continuously captures, transmits, and stores terabytes of data from hundreds of thousands of sources every hour, such as logs, Internet of Things (IoT) data, social media feeds, website clickstreams, and location-tracking events. For more information about DIS, see the .

Syntax

create table disSource (
  attr_name attr_type 
  (',' attr_name attr_type)* 
  (','PRIMARY KEY (attr_name, ...) NOT ENFORCED)
  (',' watermark for rowtime_column_name as watermark-strategy_expression)
)
with (
  'connector.type' = 'dis',
  'connector.region' = '',
  'connector.channel' = '',
  'format-type' = ''
);

Parameters

Table 1 Parameter description

Parameter

Mandatory

Description

connector.type

Yes

Data source type. Set this parameter to dis.

connector.region

Yes

Region where DIS for storing the data locates.

connector.ak

No

Access key ID. This parameter must be set in pair with sk.

connector.sk

No

Secret access key. This parameter must be set in pair with ak.

connector.channel

Yes

Name of the DIS stream where data is located.

connector.partition-count

No

Number of partitions where data will be read. Data in partition 0 to partition-count will be read.

Neither this parameter or partition-range can be configured.

If neither of the two parameters is set, all partition data will be read by default.

connector.partition-range

No

Range of partitions where data will be read. Neither this parameter or partition-count can be configured. If neither of the two parameters is set, all partition data will be read by default.

For example, if you set partition-range to [0:2], data in partitions 1, 2, and 3 will be read. The range must be within the DIS stream.

connector.offset

No

Start position from which data will be read. Either this parameter or start-time can be configured.

connector.start-time

No

Time from which DLI reads data

If this parameter is specified, DLI reads data read from the specified time. The format is yyyy-MM-dd HH:mm:ss.

If neither start-time nor offset is specified, the latest data is read.

connector. enable-checkpoint

No

Whether to enable the checkpoint function. The value can be true (enabled) or false (disabled). The default value is false.

Do not set this parameter when offset or start-time is set. If this parameter is set to true, checkpoint-app-name must be configured.

connector. checkpoint-app-name

No

ID of a DIS consumer. If a DIS stream is consumed by different jobs, you need to configure the consumer ID for each job to avoid checkpoint confusion.

Do not set this parameter when offset or start-time is set. If checkpoint-app-name is set to true, this parameter is mandatory.

connector. checkpoint-interval

No

Interval of checkpoint operations on the DIS source operator. The default value is 60s. Available value units: d, day/h, hour/min, minute/s, sec, second

Do not set this parameter when offset or start-time is configured.

format.type

Yes

Data coding format. The value can be csv or json.

format.field-delimiter

No

Attribute delimiter. You can customize the attribute delimiter only when the encoding format is CSV. The default delimiter is a comma (,).

Precautions

None

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
create table disCsvSource (
  car_id STRING,
  car_owner STRING,
  car_age INT,
  average_speed INT,
  total_miles INT)
with (
  'connector.type' = 'dis',
  'connector.region' = '',
  'connector.channel' = 'disInput',
  'format.type' = 'csv'
);