Parameter Specifications for Creating a Hudi Table with SparkSQL
Rules
- When creating a table, you must specify the primaryKey and preCombineField.
Hudi tables provide the capability for data updates and idempotent writes, which require data records to have a primary key to identify duplicate data and perform update operations. Not specifying a primary key will result in the loss of data update capabilities for the table, and not specifying preCombineField will lead to primary key duplication.
Parameter
Description
Value
Remarks
primaryKey
Primary key of the Hudi table
As needed
(Mandatory) It can be a composite primary key but must be globally unique.
preCombineField
Pre-merge key. Multiple data records with the same primary key are merged based on this field.
As needed
(Mandatory) Data with the same primary key is merged based on this field. Multiple fields cannot be specified.
- Do not set hoodie.datasource.hive_sync.enable to false when creating a table.
Setting it to false will prevent new partitions from being synchronized to the Hive Metastore. Missing new partition information will result in data loss when the query engine reads it.
- Do not set the Hudi index type to INMEMORY.
This index is for testing purposes only. Using this index in a production environment will lead to data duplication.
Example of Creating a Table
create table data_partition(id int, comb int, col0 int,yy int, mm int, dd int) using hudi --Specify a Hudi data source. partitioned by(yy,mm,dd) --Specify one or multiple partitions. location '/opt/log/data_partition' --Specify the path. If the path is not specified, the table is created in the Hive warehouse. options( type='mor', --Table type: mor or cow primaryKey='id', -- Primary key, which can be a composite one but must be globally unique. preCombineField='comb' --Pre-merge field. Data with the same primary key is merged based on this field. Currently, multiple fields cannot be specified. )
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot