Help Center/ MapReduce Service/ Component Development Specifications/ Hudi/ Spark on Hudi Development Specifications/ Parameter Specifications for Creating a Hudi Table with SparkSQL
Updated on 2025-04-15 GMT+08:00

Parameter Specifications for Creating a Hudi Table with SparkSQL

Rules

  • When creating a table, you must specify the primaryKey and preCombineField.

    Hudi tables provide the capability for data updates and idempotent writes, which require data records to have a primary key to identify duplicate data and perform update operations. Not specifying a primary key will result in the loss of data update capabilities for the table, and not specifying preCombineField will lead to primary key duplication.

    Parameter

    Description

    Value

    Remarks

    primaryKey

    Primary key of the Hudi table

    As needed

    (Mandatory) It can be a composite primary key but must be globally unique.

    preCombineField

    Pre-merge key. Multiple data records with the same primary key are merged based on this field.

    As needed

    (Mandatory) Data with the same primary key is merged based on this field. Multiple fields cannot be specified.

  • Do not set hoodie.datasource.hive_sync.enable to false when creating a table.

    Setting it to false will prevent new partitions from being synchronized to the Hive Metastore. Missing new partition information will result in data loss when the query engine reads it.

  • Do not set the Hudi index type to INMEMORY.

    This index is for testing purposes only. Using this index in a production environment will lead to data duplication.

Example of Creating a Table

create table data_partition(id int, comb int, col0 int,yy int, mm int, dd int) 
using hudi                               --Specify a Hudi data source.
partitioned by(yy,mm,dd)                 --Specify one or multiple partitions.
location '/opt/log/data_partition'       --Specify the path. If the path is not specified, the table is created in the Hive warehouse.
options(
type='mor',                              --Table type: mor or cow
primaryKey='id',                         -- Primary key, which can be a composite one but must be globally unique.
preCombineField='comb'                   --Pre-merge field. Data with the same primary key is merged based on this field. Currently, multiple fields cannot be specified.
)