Help Center/ DataArts Fabric/ Developer Guide/ Function Reference/ UDFs/ DataArts Fabric SQL UDF Overview
Updated on 2025-12-19 GMT+08:00

DataArts Fabric SQL UDF Overview

A UDF is a function defined by a user according to a specific requirement, and is used to extend a function of a database or a data processing system. UDFs can perform various operations, such as data transformation, complex computing, and data aggregation. These operations may exceed the capabilities of built-in functions provided by the system.

UDFs are used to:

  • Extended functions: UDFs allow you to extend the functions of the database or data processing system and perform operations that are not supported by built-in functions.
  • Complex computation: UDFs can be used to perform complex mathematical or logical computations that may involve multiple steps and conditions.
  • Data conversion: UDFs can be used for data cleansing and conversion, such as formatting dates, converting strings, or encoding data.
  • Service logic encapsulation: UDFs can encapsulate service logic within functions, making data processing more modular and reusable.
  • Performance optimization: In some cases, complex computational logic can be executed on the database server via UDFs to reduce data transfer and enhance performance.

From a typological perspective, UDFs include the following three main types:

Table 1 UDF types

UDF Type

Input Value

Return Value

Description

Scalar UDF

Single row of data

A scalar value

A scalar UDF processes a single row of database input and returns a scalar value. Each row is processed independently without relying on other rows' data. This type of function is ideal for scenarios requiring specific computations or transformations on individual records. The introduced scalar UDF feature enables you to perform these computations or transformations directly within the database using Python syntax.

Aggregate UDF (UDAF)

Multiple rows of data

An aggregated value

UDAF is a type of user-defined aggregate function. The input to a UDAF is a set of data rows, and the output is a single result value obtained by aggregating the entire dataset. UDAFs are commonly used for tasks like grouping statistics and aggregate calculations across datasets.

Table-valued UDF (UDTF)

Single row of data

Multiple rows of results

A UDTF takes one row of input and outputs a table containing multiple rows and columns. UDTFs are typically used for generating new rows, expanding data, or parsing multi-value fields.

Constraints

  • UDFs can be programmed only in Python.
  • Default values cannot be set for UDF parameters. Additionally, if a function is created with precision specified for its parameters or return value, no precision validation will be performed.
  • Function overloading is not supported.