Using UDFs

Scenario

DLI allows you to query data by using the user-defined functions (Hive UDF).

  • When performing UDF-related operations on the DLI console, you need to use a self-created queue.
  • When a UDF is used across accounts, it can be used only after being authorized. Only the user who creates the UDF does not need authorization. On the DLI management console, you can choose Data Management > Package Management, select the corresponding UDF JAR package, and click Manage Permissions in the Operation column. On the displayed page, click Grant Permission in the upper right corner and select the corresponding permission.

Procedure

  1. Compile a UDF.

    Create or modify the content of SumUdfDemo.java in the sample code based on service requirements. For details, see Example Code.

    Figure 1 Creating or modifying SumUdfDemo.java
  2. Generate a JAR package, set the output JAR package to TestUDF.jar, and run the Build Artifacts command.
    Figure 2 Selecting Artifacts
    Figure 3 Running Build Artifacts

    Once the artifacts are successfully built, a TestUDF.jar file is generated in the corresponding path. As shown in Figure 2, the path is udfDemo\target\artifacts\TestUDF.

  3. Upload TestUDF.jar to OBS. For details about how to upload data to OBS, see Step 2: Upload Data to OBS in Submitting a SQL Job.
  4. Create a function.

    Run the following command on the management console to create a function:

    CREATE FUNCTION fun1 AS 'com.huawei.demo.SumUdfDemo' using jar 'obs://udf/TestUDF.jar';
    • If you use an existing class name to create a function, you must restart the original queue on the Queue Management page. Otherwise, the function may not take effect.
    • For details about the SQL syntax of user-defined functions, see the Data Lake Insight SQL Syntax Reference.
  5. Use the created function.

    Run the following statement to query using the function created in 4.

    select fun1(ip) from ip_tables;
  6. Delete the created function.

    If this function is no longer used, run the following statement to delete the function:

    Drop FUNCTION fun1;

Example Code

The sample code in SumUdfDemo.java is as follows:

package com.huawei.demo;
  import org.apache.hadoop.hive.ql.exec.UDF;
  public class SumUdfDemo extends UDF {
    public int evaluate(int a, int b) {
     return a + b;
  }
 }