Updated on 2024-08-30 GMT+08:00

Development Rules

The following table describes the parameter specifications you need to comply with to read Hudi tables on Flink streams.

Table 1 Parameter specifications

Parameter

Mandatory

Description

Example

Connector

Yes

Type of the table to be read

hudi

Path

Yes

Path for storing the table

Set this parameter based on site requirements.

table.type

Yes

Hudi table type. The default value is COPY_ON_WRITE.

MERGE_ON_READ

hoodie.datasource.write.recordkey.field

Yes

Primary key of the table

Set this parameter as needed.

write.precombine.field

Yes

Data combination field

Set this parameter as needed.

read.tasks

No

Hudi table read parallelism. The default value is 4.

4

read.streaming.enabled

Yes

  • true: Data is read on streams incrementally.
  • false: Data is read in batches.

Set this parameter based on the site requirements. For streaming read, set this parameter to true.

read.streaming.start-commit

No

Start commit (closed interval) in the yyyyMMddHHmmss format. By default, the latest commit is used.

-

hoodie.datasource.write.keygenerator.type

No

Primary key generation type of the upstream table

COMPLEX

read.streaming.check-interval

No

Check interval for finding new source commits. The default value is 1 minute.

5 (The default value is recommended for heavy traffic.)

read.end-commit

No

  • Incremental stream consumption. Use read.streaming.start-commit to specify the start position.
  • Batch incremental consumption. Use read.streaming.start-commit to specify the start position, and the read.end-commit to specify the end position (closed interval). The start and end positions are included. By default, the latest commit is the end position.

-

changelog.enabled

No

Whether to write changelog messages. The default value is false. Set this parameter to true for CDC.

false