Development Rules
Parameter Specifications
The following table describes the parameter specifications you need to comply with to write Hudi tables on Flink streams.
Parameter |
Mandatory |
Description |
Recommended Value |
---|---|---|---|
Connector |
Yes |
Type of the table to be read |
hudi |
Path |
Yes |
Path for storing the table |
Set this parameter as required based on service requirements. |
hoodie.datasource.write.recordkey.field |
Yes |
Primary key of the table |
Set this parameter as required based on service requirements. |
write.precombine.field |
Yes |
Data combination field |
Set this parameter as required based on service requirements. |
write.tasks |
No |
Hudi table write parallelism. The default value is 4. |
4 |
index.bootstrap.enabled |
No |
Flink uses the memory index, which caches the primary key of data to the memory to ensure unique data in the target table. This parameter must be set. Otherwise, data may be duplicate. The default value is true. The default value is FALSE. Do not set this parameter when bucketing indexes are used. |
TRUE |
write.index_bootstrap.tasks |
No |
This parameter is valid only after index.bootstrap.enabled is enabled. Increase the number of tasks to accelerate startup. |
4 |
index.state.ttl |
No |
Duration for storing index data. The default value is 0, indicating that the index data is permanently valid. You can change the value based on the service requirements. |
0 |
compaction.delta_commits |
No |
Condition for triggering the compaction plan of the MOR table |
200 |
compaction.async.enabled |
Yes |
Whether to enable online compaction. The compaction operation is transferred to SparkSQL to improve the write performance. |
FALSE |
hive_sync.enable |
No |
Whether to synchronize table information to Hive. |
True |
hive_sync.metastore.uris |
No |
Hivemeta URI |
Set this parameter as required based on service requirements. |
hive_sync.jdbc_url |
No |
Hive JDBC link |
Set this parameter as required based on service requirements. |
hive_sync.table |
No |
Hive table name |
Set this parameter as required based on service requirements. |
hive_sync.db |
No |
Name of the Hive database. The default value is default. |
Set this parameter as required based on service requirements. |
hive_sync.support_timestamp |
No |
Whether to support timestamps |
True |
changelog.enabled |
No |
Whether to write changelog messages. The default value is false. Set this parameter to true for CDC. |
false |
Table Name Must Meet Hive Format Requirements
- Must start with a letter or underscore (_) and cannot start with a digit.
- Can contain only letters, digits, and underscores (_).
- Can contain a maximum of 128 characters.
- Cannot contain spaces or special characters, such as colons (:), semicolons (;), and slashes (/).
- Is case insensitive. Lowercase letters are recommended.
- Cannot be Hive reserved keywords, such as select, from, and where.
[Example]
my_table, customer_info, sales_data
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot