Help Center/ MapReduce Service/ Component Operation Guide (LTS)/ Using Hudi/ Operating a Hudi Table Using spark-sql
Updated on 2024-12-13 GMT+08:00

Operating a Hudi Table Using spark-sql

This section applies only to MRS 3.5.0-LTS and later versions.

Scenario

This section describes how to use the Hudi function using spark-sql.

Prerequisites

You have created a user and added the user to user groups hadoop (primary group) and hive on Manager.

Procedure

  1. Download and install the Hudi client. For details, see Installing a Client.

    Currently, Hudi is integrated in Spark. You only need to download the Spark client on Manager. For example, the client installation directory is /opt/client.

  2. Log in to the node where the client is installed as user root and run the following command:

    cd /opt/client

  3. Run the following commands to load environment variables:

    source bigdata_env

    source Hudi/component_env

    kinit Created user

    • You need to change the password of the created user, and then run the kinit command to log in to the system again.
    • In normal mode (Kerberos authentication disabled), you do not need to run the kinit command.
    • If multiple services are installed, run the component_env command of the source Spark and then the component_env command of the source Hudi after you run the source bigdata_env command.

  4. Start spark-sql.

    • Create a Hudi table.

      create table if not exists hudi_table2 (id int,name string,price double) using hudi options (type = 'cow',primaryKey = 'id',preCombineField = 'price');

    • Insert data.

      insert into hudi_table2 select 1,1,1;

      insert into hudi_table2 select 2,1,1;

    • Update data.

      update hudi_table2 set name=3 where id=1;

    • Delete data.

      delete from hudi_table2 where id=2;

    • Query data.

      select * from hudi_table2;