Updated on 2024-04-30 GMT+08:00

Synchronizing Hudi Tasks

Creating a Hudi Task

  • Migration

    If data has been imported to the GaussDB(DWS) table using CDL, use SQL on Hudi to migrate data. Alternatively, use CDM to perform full initialization and then use SQL on Hudi to synchronize incremental data.

    1. To create the hudi.hudi_sync_state synchronization status table, you must have the administrator permission.

    1
    SELECT pg_catalog.create_hudi_sync_table();
    

    Generally, hudi.hudi_sync_state is created only once in each database.

    2. To set the CDL synchronization progress, you must have the INSERT and UPDATE permissions on the target table and the SELECT permission on the HUDI foreign table. Otherwise, the synchronization progress cannot be set.

    1
    SELECT hudi_set_sync_commit('SCHEMA.TABLE', 'SCHEMA.FOREIGN_TABLE', 'LATEST_COMMIT');
    

    Where:

    SCHEMA.TABLE indicates the name and schema of the target table for data synchronization.

    SCHEMA.FOREIGN_TABLE indicates the name and schema of the OBS foreign table.

    LATEST_COMMIT indicates the end time of the Hudi synchronization.

    Example:

    The target table public.in_rel has synchronized data from hudi by 20220913152131. Use SQL on Hudi to continue to export data from the OBS foreign table hudi_read1.

    1
    SELECT hudi_set_sync_commit('public.in_rel', 'public.hudi_read1', '20220913152131');
    

    3. Submit the Hudi synchronization task.

    1
    SELECT hudi_sync_task_submit('SCHEMA.TABLE', 'SCHEMA.FOREIGN_TABLE');
    

    Example:

    Use SQL on Hudi to continue to export data from the OBS foreign table hudi_read1 to the target table public.in_rel.

    1
    SELECT hudi_sync_task_submit('public.in_rel', 'public.hudi_read1');
    
  • Creation

    If the GaussDB(DWS) table is empty and data is synchronized from Hudi for the first time, run the following command to create a task:

    1
    SELECT hudi_sync_task_submit('SCHEMA.TABLE', 'SCHEMA.FOREIGN_TABLE');
    

Querying Hudi Synchronization Tasks

Query a Hudi synchronization task. In the query result, task_id uniquely identifies a Hudi synchronization task.

1
SELECT * FROM pg_task_show('SQLonHudi');

Suspending Hudi Synchronization Tasks

Query the Hudi task, obtain the task ID, and run the following command:

1
SELECT pg_task_pause('task_id');

Example:

Suspend the synchronization task whose task_id is 64479410-a04c-0700-d150-3037d700fffe.

1
SELECT pg_task_pause('64479410-a04c-0700-d150-3037d700fffe');

Resuming Hudi Synchronization Tasks

Query the Hudi task, obtain the task ID, and run the following command:

1
SELECT pg_task_resume('task_id');

Example:

Resume the synchronization task whose task_id is 64479410-a04c-0700-d150-3037d700fffe.

1
SELECT pg_task_resume('64479410-a04c-0700-d150-3037d700fffe');

Deleting a Hudi Synchronization Task

Query the Hudi task, obtain the task ID, and run the following command:

1
SELECT pg_task_remove('task_id');

Example:

Delete the synchronization task whose task_id is 64479410-a04c-0700-d150-3037d700fffe.

1
SELECT pg_task_remove('64479410-a04c-0700-d150-3037d700fffe');

Querying the Synchronization Status

Use the hudi_show_sync_state() function to query the status of the Hud synchronization task.

1
SELECT * FROM hudi_show_sync_state();