Creating a Function
Function
DLI allows you to create and use user-defined functions (UDF) and user-defined table functions (UDTF) in Spark jobs.
For details about the custom functions, see Calling UDFs for Spark SQL Jobs and Calling UDTF for Spark SQL Jobs.
Syntax
1 2 3 4 5 |
CREATE [TEMPORARY][OR REPLACE] FUNCTION [db_name.]function_name AS class_name
[USING resource,...]
resource:
: JAR file_uri
|
Precautions
- If a function with the same name exists in the database, the system reports an error.
- Only the Hive syntax can be used to create functions.
- If you specify the same class name for two UDFs, the functions conflict though the package names are different. Avoid this problem because it causes failure of job execution.
- To use the OR REPLACE statement, you need to submit a service ticket.
Keyword
- TEMPORARY: The created function is available only in the current session and is not persisted to the underlying metabase, if any. The database name cannot be specified for a temporary function.
- USING <resources>: resources to be loaded. It can be a list of JARs, files, or URIs.
- OR REPLACE: Whether to reload function resources
- The following table describes scenarios where you do not need OR REPLACE.
Table 1 Scenarios where OR REPLACE is not specified No.
Description
Examples
Take Effect When
Operation Impact
1
You modify the implementation of the original class and specify the original JAR package name and class name for a new function.
- UDF F1 created for a Spark SQL queue is contained in C1 in the JAR package J1.
- You modify the implementation of function in the J1 package, save the function as UDF F2, and specify its class as C1 and the JAR package as J1.
NOTE:
Note that if UDF names conflict, the function will fail to be created. In this case, you can use OR REPLACE to replace F1 in all jobs with F2.
The Spark SQL queue is restarted.
- Running jobs are interrupted because the Spark SQL queue must be restarted.
- After the queue is restarted, original UDF F1 becomes the same as F2.
2
You create a new class in the original package, specify the new class to the UDF you created, and retain the original package name.
- UDF F1 created for a Spark SQL queue is contained in C1 in the JAR package J1.
- You add class C2 to the JAR package J1. C1 remains unchanged. You create UDF F2 and specify its class name to C2 and its JAR package to J1.
The Spark SQL queue is restarted.
- Running jobs are interrupted because the Spark SQL queue must be restarted.
- After the queue is restarted, F1 remains unchanged.
3
You keep the implementation of the original functions unchanged, and repack the program. You specify a new JAR package for the new function and retain the original class name.
- UDF F1 created for a Spark SQL queue is contained in C1 in the JAR package J1.
- You repack the JAR package as J2. The function logic remains unchanged. You specify the class of the created UDF to C1, and the JAR package name to J2.
UDF F2 is created.
None
- If OR REPLACE is used when you create a UDF, the existing function will be replaced and the replacement takes effect immediately.
- To use the OR REPLACE keyword, you need to submit a service ticket.
- To make the replacement take effect immediately on all SQL queues, run the statement for each SQL queue. The replacement may take effect 0 to 12 hours later on the queues where the statement is not executed.
- If you modify the J1 package (for example, CREATE OR REPLACE FUNCTION F1) in the middle of job execution, jobs being executed run the original UDF F1 (of the C1 class in the J1 package) and jobs that starts after the modification run the new F1 logic.
Table 2 Scenarios where OR REPLACE is used No.
Description
Examples
Take Effect When
Operation Impact
1
You modify the implementation of the original class and specify the original JAR package name and class name for a new function.
- UDF F1 created for a Spark SQL queue is contained in C1 in the JAR package J1.
- You modify the implementation of a function in the J1 package, save the function as UDF F1, and specify its class as C1 and the JAR package as J1.
Immediately
F1 is changed.
2
You create a new class in the original package, specify the new class to the UDF you created, and retain the original package name.
- UDF F1 created for a Spark SQL queue is contained in C1 in the JAR package J1.
- You add class C2 to the JAR package J1. C1 remains unchanged. You create UDF F2 and specify its class name to C2 and its JAR package to J1. F2 does not take effect immediately because it is the first function specified to the C2 class.
Either of the following happens:
- CREATE OR REPLACE F2 FUNCTION is executed again.
- The Spark SQL queue is restarted.
If you restart the Spark SQL queue, running jobs will be affected.
3
You change the class name and specify the new class name to the new UDF. The package name remains unchanged.
- UDF F1 created for a Spark SQL queue is contained in C1 in the JAR package J1.
- You change C1 in the JAR package of J1 to C2, create the UDF F2, specify its class name to C2, and its JAR package name to J1.
Immediately
F1 is replaced by F2 immediately.
4
You keep the implementation of the original functions unchanged, and repack the program. You specify a new JAR package for the new function and retain the original class name.
- UDF F1 created for a Spark SQL queue is contained in C1 in the JAR package J1.
- You repack the JAR package as J2. The function logic remains unchanged. You specify the class of the created UDF to C1, and the JAR package name to J2.
Immediately
None
- The following table describes scenarios where you do not need OR REPLACE.
Example
Create the mergeBill function.
1 2 |
CREATE FUNCTION mergeBill AS 'com.xxx.hiveudf.MergeBill'
using jar 'obs://onlyci-7/udf/MergeBill.jar';
|
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.