Hudi Load, Truncate+Load, and Insert Overwrite Modes
Overview
DataArts Migration updates and incrementally loads Hudi data by performing write operations. Hudi supports three write modes: Load, Truncate+Load, and Insert Overwrite.
Hive Data Write Modes
- Load mode
This is a basic write mode of Hudi. In this mode, data is appended to a destination Hudi table, and existing data in the table is not cleared.
- Characteristics
- Destination table not to be cleared: The destination table is not cleared before data is written to it. New data is appended to the table.
- Incremental write: This mode is suitable for adding new data to a destination table that already contains data.
- Example
The source table is the MySQL table named data, the destination table is a Hudi table named data, and the partition field is dt. You can configure a job to write incremental data in the MySQL table to the Hudi table. Hudi appends the data to the corresponding partition based on the partition field dt.
Source MySQL table
CREATE TABLE `data` ( `id` varchar(10) DEFAULT NULL, `dt` date DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Destination Hudi table
CREATE TABLE `data` ( `id` varchar(10) DEFAULT NULL, `dt` date DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf8;
- Characteristics
- Truncate+Load mode
In this mode, data in the partitions of a destination Hudi table are cleared before new data is loaded.
- Characteristics
- Clearing data in the partition of a destination table: Data in the selected partitions of a destination table is cleared before new data is written.
- Partition retention: Although data is cleared, the partition structure is retained. New data is written to the corresponding partition based on the partition field.
- Example
The source table is the MySQL table named data, the destination table is a Hudi table named data, and the partition field is dt.
Configure a job to truncate and delete the data in the dt=2025-10-21 partition in the Hudi table, and then write the data in the MySQL table to the Hudi table. Hudi reorganizes the data to the corresponding partition based on the partition field dt.
Source MySQL table
CREATE TABLE `data` ( `id` varchar(10) DEFAULT NULL, `dt` date DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Destination Hive table
CREATE TABLE `data` ( `id` varchar(10) ) PARTITIONED BY (`dt` date) STORED AS PARQUET
- Characteristics
- Insert Overwrite mode
Data is first written to a temporary table, and then the data in the temporary table is written to the destination table using the INSERT OVERWRITE syntax of Hudi and overwrites existing data in the destination table.
- Implementation
This mode is implemented in Hudi through the following steps:
- Creating a temporary table: Create a temporary table to store data.
- Writing data to the temporary table: Write data to the temporary table.
- Overwriting existing data in the destination table: Use the INSERT OVERWRITE syntax to overwrite existing data in the destination table with the data in the temporary table.
- Characteristics
- Temporary table: The table temporarily stores data.
- Overwriting existing data in the destination table: The INSERT OVERWRITE syntax of Hudi is used to overwrite existing data in the destination table with the data in the temporary table.
- Writing data to dynamic and static partitions: The Insert Overwrite mode can write data to dynamic and static partitions.
- Writing data to dynamic partitions: Values of partition fields in the source data are automatically identified and used to determine the destination partitions. Data in multiple partitions can be overwritten at a time. This meets your requirements for updating data in multiple partitions at the same time.
- Writing data to static partitions: You need to specify the destination partition in the write operation. Data in only one partition can be overwritten at a time. This method is suitable for updating data in a specific partition.
- Example
The source table is the MySQL table named data, the destination table is a Hudi table named data, and the partition field is dt.
The structure of the source MySQL table is as follows:
CREATE TABLE `data` ( `id` varchar(10) DEFAULT NULL, `dt` date DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The structure of the destination Hudi table is as follows:
CREATE TABLE `data` ( `id` varchar(10))PARTITIONED BY (`dt` date)STORED AS PARQUET LOCATION 'hdfs://path/to/hudi/table/data';
- Static partition
If you want to write all data in the MySQL table to the dt=2025-10-21 partition of the Hudi table, you can select the INSERT OVERWRITE mode and set the partition filter to dt=2025-10-21 in the write operation.
- Dynamic partition
If the MySQL table contains data from multiple partitions and you want the data to be automatically identified and to overwrite data in the destination partition based on the partition field value in the source data, you can select the Insert Overwrite mode. In this case, you do not need to specify the partition filter. You need to configure mapping of the dt partition field.
- Static partition
- Implementation
Summary
You can select an appropriate Hudi data write mode from Load, Truncate+Load, and Insert Overwrite to efficiently migrate and update data. You are advised to follow the recommendations in this practice to ensure data consistency and high system performance.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot