Updated on 2026-05-20 GMT+08:00

Modes for Writing Data to LakeFormation

Overview

DataArts Migration provides two modes for writing upstream data to LakeFormation: Load and Load Overwrite. Both modes create a temporary path in the OBS path of a table, write upstream data to the temporary path in the file format of the table, and then rename the file and move it to the formal directory of the table. Load moves the file directly to add data, while Load Overwrite clears existing data in partitions before moving the file to update data. The modes are similar to the Load and Load Overwrite syntax of Hive.

Write Modes

  • Load mode

    This is the basic write mode for LakeFormation. In this mode, a temporary path is created in the OBS path of a table, and data is written to the data file in the temporary OBS path. When the task is complete, the temporary file is moved to the formal OBS path of the table. Data is written to the destination table in incremental mode, and the destination table is not cleared.

    • Characteristics
      • Destination table not to be cleared: The destination table is not cleared before data is written to it. New data is added to the table.
      • Incremental write: This mode is suitable for adding new data to a destination table that already contains data.
    • Example

      The source table is a MySQL table named data, the destination table is a LakeFormation table named data, and the partition field is dt.

      You can configure a job to write incremental data in the MySQL table to the LakeFormation table. LakeFormation appends the data to the corresponding partition based on the partition field dt.

      Figure 1 Configuring source and destination tables

      The structure of the source MySQL table is as follows:

      CREATE TABLE `data` (
        `id` varchar(10) DEFAULT NULL,
        `dt` date DEFAULT NULL
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
  • Load Overwrite mode

    In this mode, a temporary path is created in the OBS path of a table, and data is written to the data file in the temporary OBS path. When the task is complete, data in the corresponding partition is cleared, and the temporary file is moved to the formal OBS path of the table.

    • Implementation

      This mode is implemented in LakeFormation through the following steps:

      1. Creating a temporary directory: Create a temporary directory to store data.
      2. Clearing data in the corresponding partition: Clear the existing data in the partition when the job is complete.
      3. Adding the data file: Rename the data file in the temporary directory and move it to the corresponding partition path.
    • Writing data to dynamic or static partitions
      The Load Overwrite mode can write data to dynamic or static partitions.
      • Writing data to dynamic partitions: Values of partition fields in the source data are automatically identified and used to determine the destination partitions. Data in multiple partitions can be overwritten at a time. This meets your requirements for updating data in multiple partitions at the same time.
      • Writing data to static partitions: You need to specify the destination partition in the write operation. Data in only one partition can be overwritten at a time. This method is suitable for updating data in a specific partition.
    • Example

      The source table is a MySQL table named data, the destination table is a LakeFormation table named data, and the partition field is dt.

      The structure of the source MySQL table is as follows:

      CREATE TABLE `data` (
        `id` varchar(10) DEFAULT NULL,
        `dt` date DEFAULT NULL
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

      The structure of the destination LakeFormation table is as follows:

      create table if not exists "default"."data"(
      "id" varchar(10))
      STORE AS PARQUET
       PARTITION BY ("dt" date)
      • Writing data to a static partition

        If you want to write all data in the MySQL table to the dt=2025-10-21 partition of the LakeFormation table, you can select the LOAD OVERWRITE mode and set the partition filter to dt=2025-10-21 in the write operation.

      • Writing data to a dynamic partition

        The source MySQL data is from multiple partitions. You want to enable automatic identification of the values of partition fields in the source data, and want to use the values to determine the destination partitions. You hope that data in multiple partitions can be overwritten at a time so that you can update data in multiple partitions at the same time.

        You can set Write Mode to LOAD_OVERWRITE, leave Partition Filter Condition empty, and configure partition field dt for the source and destination field mapping.

Summary

You can select an appropriate data loading mode from Load and Load Overwrite to efficiently migrate and update data. You are advised to follow the recommendations in this practice to ensure data consistency and high system performance.