Updated on 2022-12-07 GMT+08:00

UDFs

DLI allows you to query data by using the user-defined functions (Hive UDF).

Procedure

  1. Compile a UDF.
    1. Add the following information to the POM file:
      <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>1.2.1</version>
      </dependency>
    2. Add import org.apache.hadoop.hive.ql.exec.UDF to the code.
    3. Create or modify the content of SumUdfDemo.java in the sample code based on service requirements.
  2. Generate a JAR package. Run the mvn package command to compile the program and generate the JAR package.
    For example, after the program is compiled using IDEA, click Terminal at the bottom of the tool pane and enter mvn package in the command line window to pack the program.
    Figure 1 Compiling and packaging the program

    Obtain the generated JAR package, for example, TestUDF.ja, from the path where the compilation result is stored.

  3. Upload TestUDF.jar to OBS. For details about how to upload data to OBS, see "Step 2: Upload Data to OBS" in Creating and Submitting a Spark SQL Job.
  4. Upload the TestUDF.jar package to DLI package management.
    1. Log in to the DLI management console and choose Data Management > Package Management.
    2. On the Package Management page, click Create in the upper right corner.
    3. In the Create Package dialog, set the following parameters:
      1. Type: Select JAR.
      2. OBS Path: Specify the OBS path in 3 for storing the package.
      3. Set Group and Group Name as required for package identification and management.
    4. Click OK.
  5. Create a function.

    Run the following command on the management console to create a function:

    CREATE FUNCTION fun1 AS 'com.demo.SumUdfDemo' using jar 'obs://udf/TestUDF.jar';
  6. Restart the original SQL queue for the added function to take effect.
    1. Log in to the DLI management console and choose Resources > Queue Management from the navigation pane. In the Operation column of the SQL queue job, click Restart.
    2. In the Restart dialog box, click OK.
  7. Use the created function.

    Run the following statement to query using the function created in 5.

    select fun1(ip) from ip_tables;
  8. Delete the created function.

    If this function is no longer used, run the following statement to delete the function:

    Drop FUNCTION fun1;

Sample Code

The sample code in SumUdfDemo.java is as follows:

package com.demo;
  import org.apache.hadoop.hive.ql.exec.UDF;
  public class SumUdfDemo extends UDF {
    public Int evaluate(Int a, Int b) {
     return a + b;
  }
 }