Updated on 2023-01-11 GMT+08:00

Configuring a Hudi Data Source

Scenario

HetuEngine can be connected to the Hudi data source of the cluster of MRS 3.1.1 or later.

HetuEngine does not support the reading of Hudi bootstrap tables.

Prerequisites

  • You have created the proxy user of the Hudi data source. The proxy user is a human-machine user and must belong to the hive group.
  • In the /etc/hosts file of all nodes in the cluster where HetuEngine is located, add the mapping between the host names and IP addresses of the cluster where the data source to be connected is located, and add 10.10.10.10 hadoop.System domain name in the /etc/hosts file (for example, 10.10.10.10 hadoop.hadoop.com). Otherwise, HetuEngine cannot connect to the nodes that are not in the cluster based on the host name.
  • You have created a HetuEngine administrator by referring to Creating a HetuEngine User.

Procedure

  1. Perform 1 to 6.g to configure a traditional data source by referring to Configuring a Traditional Data Source.
  2. In the Custom Configuration area, add a custom parameter, as listed in Table 1.

    Table 1 Custom configuration parameter for Hudi data

    Custom Parameter Name

    Value

    hive.parquet.use-column-names

    true

  3. Click OK.

Data Type Mapping

Currently, Hudi data sources support the following data types: INT, BIGINT, FLOAT, DOUBLE, DECIMAL, STRING, DATE, TIMESTAMP, BOOLEAN, BINARY, MAP, TRUCT, ARRAY.