DataArts Fabric SQL UDF Overview

A UDF is a function defined by a user according to a specific requirement, and is used to extend a function of a database or a data processing system. UDFs can perform various operations, such as data transformation, complex computing, and data aggregation. These operations may exceed the capabilities of built-in functions provided by the system.

UDFs are used to:

Extended functions: UDFs allow you to extend the functions of the database or data processing system and perform operations that are not supported by built-in functions.
Complex computation: UDFs can be used to perform complex mathematical or logical computations that may involve multiple steps and conditions.
Data conversion: UDFs can be used for data cleansing and conversion, such as formatting dates, converting strings, or encoding data.
Service logic encapsulation: UDFs can encapsulate service logic within functions, making data processing more modular and reusable.
Performance optimization: In some cases, complex computational logic can be executed on the database server via UDFs to reduce data transfer and enhance performance.

From a typological perspective, UDFs include the following three main types:

**Table 1** UDF types
UDF Type	Input Value	Return Value	Use Case
Scalar UDF	One row of data	One scalar value	One-to-one business scenarios, such as mathematical operations and string processing.
Aggregate UDF	Multiple rows of data	One aggregated value	Many-to-one business scenarios, such as summation and averaging.
Table-valued UDF	One row of data	Multiple rows of results	One-to-many business scenarios, such as splitting one row of data into multiple rows.

Constraints

The current version only supports Scalar UDFs. Aggregate UDFs and table-valued UDFs are not supported. Only Python is supported as the programming language.
Both input parameters and return values of UDFs support only basic data types, and parameters cannot have default values.
Function overloading is not supported.

Introduction to Scalar UDF

Scalar UDF is a type of UDF. The input of a Scalar UDF is a single row of data from the database, and the output is a scalar value. The processing of each row's input data is independent of other rows' data. Such functions are suitable for scenarios requiring specific calculations or transformations on data. The Scalar UDF introduced in this feature allows you to use Python syntax to compute or transform data within the database.

Characteristics of Scalar UDF include:

One-to-one relationship: Each invocation of a Scalar UDF returns a single result value corresponding to the input single row of data.
Reusability: Encapsulating common logic into a Scalar UDF enables its reuse across multiple queries, enhancing code readability and maintainability.

Parent topic: UDFs

Previous topic: UDFs

Next topic: UDF Development (Python)