DataArts Fabric SQL UDF Overview
A UDF is a function defined by a user according to a specific requirement, and is used to extend a function of a database or a data processing system. UDFs can perform various operations, such as data transformation, complex computing, and data aggregation. These operations may exceed the capabilities of built-in functions provided by the system.
UDFs are used to:
- Extended functions: UDFs allow you to extend the functions of the database or data processing system and perform operations that are not supported by built-in functions.
- Complex computation: UDFs can be used to perform complex mathematical or logical computations that may involve multiple steps and conditions.
- Data conversion: UDFs can be used for data cleansing and conversion, such as formatting dates, converting strings, or encoding data.
- Service logic encapsulation: UDFs can encapsulate service logic within functions, making data processing more modular and reusable.
- Performance optimization: In some cases, complex computational logic can be executed on the database server via UDFs to reduce data transfer and enhance performance.
From a typological perspective, UDFs include the following three main types:
|
UDF Type |
Input Value |
Return Value |
Description |
|---|---|---|---|
|
Scalar UDF |
Single row of data |
A scalar value |
A scalar UDF processes a single row of database input and returns a scalar value. Each row is processed independently without relying on other rows' data. This type of function is ideal for scenarios requiring specific computations or transformations on individual records. The introduced scalar UDF feature enables you to perform these computations or transformations directly within the database using Python syntax. |
|
Aggregate UDF (UDAF) |
Multiple rows of data |
An aggregated value |
UDAF is a type of user-defined aggregate function. The input to a UDAF is a set of data rows, and the output is a single result value obtained by aggregating the entire dataset. UDAFs are commonly used for tasks like grouping statistics and aggregate calculations across datasets. |
|
Table-valued UDF (UDTF) |
Single row of data |
Multiple rows of results |
A UDTF takes one row of input and outputs a table containing multiple rows and columns. UDTFs are typically used for generating new rows, expanding data, or parsing multi-value fields. |
Constraints
- UDFs can be programmed only in Python.
- Default values cannot be set for UDF parameters. Additionally, if a function is created with precision specified for its parameters or return value, no precision validation will be performed.
- Function overloading is not supported.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot