Updated on 2024-12-10 GMT+08:00

Development Rules

Parameter Specifications

The following table describes the parameter specifications you need to comply with to write Hudi tables on Flink streams.

Table 1 Parameter specifications

Parameter

Mandatory

Description

Recommended Value

Connector

Yes

Type of the table to be read

hudi

Path

Yes

Path for storing the table

Set this parameter as required based on service requirements.

hoodie.datasource.write.recordkey.field

Yes

Primary key of the table

Set this parameter as required based on service requirements.

write.precombine.field

Yes

Data combination field

Set this parameter as required based on service requirements.

write.tasks

No

Hudi table write parallelism. The default value is 4.

4

index.bootstrap.enabled

No

Flink uses the memory index, which caches the primary key of data to the memory to ensure unique data in the target table. This parameter must be set. Otherwise, data may be duplicate. The default value is true. The default value is FALSE. Do not set this parameter when bucketing indexes are used.

TRUE

write.index_bootstrap.tasks

No

This parameter is valid only after index.bootstrap.enabled is enabled. Increase the number of tasks to accelerate startup.

4

index.state.ttl

No

Duration for storing index data. The default value is 0, indicating that the index data is permanently valid. You can change the value based on the service requirements.

0

compaction.delta_commits

No

Condition for triggering the compaction plan of the MOR table

200

compaction.async.enabled

Yes

Whether to enable online compaction. The compaction operation is transferred to SparkSQL to improve the write performance.

FALSE

hive_sync.enable

No

Whether to synchronize table information to Hive.

True

hive_sync.metastore.uris

No

Hivemeta URI

Set this parameter as required based on service requirements.

hive_sync.jdbc_url

No

Hive JDBC link

Set this parameter as required based on service requirements.

hive_sync.table

No

Hive table name

Set this parameter as required based on service requirements.

hive_sync.db

No

Name of the Hive database. The default value is default.

Set this parameter as required based on service requirements.

hive_sync.support_timestamp

No

Whether to support timestamps

True

changelog.enabled

No

Whether to write changelog messages. The default value is false. Set this parameter to true for CDC.

false

Table Name Must Meet Hive Format Requirements

  • Must start with a letter or underscore (_) and cannot start with a digit.
  • Can contain only letters, digits, and underscores (_).
  • Can contain a maximum of 128 characters.
  • Cannot contain spaces or special characters, such as colons (:), semicolons (;), and slashes (/).
  • Is case insensitive. Lowercase letters are recommended.
  • Cannot be Hive reserved keywords, such as select, from, and where.

[Example]

my_table, customer_info, sales_data