Updated on 2025-02-22 GMT+08:00

Submitting a Spark SQL Job in DLI Using Hudi

Log in to the DLI management console. In the navigation pane on the left, choose SQL Editor. When submitting a SQL job, select a Spark SQL queue that supports Hudi.

  1. Create a Hudi table.

    Paste the following table creation statements to the edit area of the DLI SQL editor, replace the LOCATION value with the actual path, set Engine to Spark, configure Queues, Catalog, and Databases, and click Execute in the upper right corner to submit the job.

    Note: Hudi internal tables cannot be created when using DLI as the metadata service. So, you must set LOCATION to an OBS path.
    CREATE TABLE
      hudi_table (id int, comb long, name string, dt date) USING hudi PARTITIONED BY (dt) OPTIONS (
        type = 'cow',
        primaryKey = 'id',
        preCombineField = 'comb'
      ) LOCATION 'obs://bucket/path/hudi_table';

    Wait for the execution history below to show that the job is successfully executed, which means that the table is created. At this point, a COW partition table of Hudi is created.

    You can run SHOW TABLES to check if the table is successfully created.

    SHOW TABLES;

  2. Run the following SQL statement to write data to the created Hudi table:

    INSERT INTO hudi_table VALUES (1, 100, 'aaa', '2021-08-28'), (2, 200, 'bbb', '2021-08-28');

    To check the execution result, navigate to the Executed Queries (Last Day) tab at the bottom of the editor. Alternatively, you can choose Job Management > SQL Jobs on the left navigation pane to view the status of the SQL job.

  3. Set Hudi parameters when running SQL statements.

    DLI does not support setting parameters by running SET statements.

    Click Settings. In the Parameter Settings area, set the key and value. The parameters configured here will take effect when a SQL job is submitted.

    In the navigation pane on the left, choose Job Management > SQL Jobs. Locate the job that is being executed and expand its details. In Parameter Settings, check its parameter settings.

  4. Run the following SQL statement to query the written content:

    select id,comb,name,dt from hudi_table where dt='2021-08-28';

    You can view the query result in the lower pane of the editor.

  5. Delete the Hudi table you created.

    If a foreign table is created, only the metadata of the Hudi table is deleted when the SQL statement is executed to delete the table, and the data still exists in the OBS bucket and needs to be manually deleted.

    DROP TABLE IF EXISTS hudi_table;