Updated on 2025-12-19 GMT+08:00

User-Defined Function APIs

Explicit Registration Syntax for UDFs

Explicit registration refers to the manual embedding of registration logic directly into Python code, typically achieved through methods like backend.register or register_from_file, where calling these methods triggers the registration process. This method relies on having access to a backend session object before proceeding.

Scenarios recommending explicit registration: If you wish to explicitly control registration timing, allow intrusive addition of registration logic, or require separation of UDF registration and usage under the same backend connection.

A common scenario involves one team handling UDF registration while multiple teams utilize these functions, with no shared Python scripts between them.

Table 1 Explicit registration syntax for UDFs

UDF Type

UDF Type (Secondary)

Registration Type (Tertiary)

Code Entry

Reference

udf, udaf, udtf

python

Direct registration

backend.[udf | udaf | udtf].python.register(<Registration function>, <Registration parameters>)

Python/PyArrow/Pandas UDF registration parameters

File-based registration

backend.[udf | udaf | udtf].python.register_from_file(<File path>, <Function name>, <Registration parameters>)

Python/PyArrow/Pandas UDF registration parameters

builtin

Direct registration

backend.[udf | udaf | udtf].builtin.register(<Registration function>, <Registration parameters>)

Builtin UDF registration parameters

File-based registration

backend.[udf | udaf | udtf].builtin.register_from_file(<File path>, <Function name>, <Registration parameters>)

Builtin UDF registration parameters

pyarrow

Direct registration

backend.udf.pyarrow.register(<Registration function>, <Registration parameters>)

Python/PyArrow/Pandas UDF registration parameters

File-based registration

backend.udf.pyarrow.register_from_file(<File path>, <Function name>, <Registration parameters>)

Python/PyArrow/Pandas UDF registration parameters

pandas

Direct registration

backend.udf.pandas.register(<Registration function>, <Registration parameters>)

Python/PyArrow/Pandas UDF registration parameters

File-based registration

backend.udf.pandas.register_from_file(<File path>, <Function name>, <Registration parameters>)

Python/PyArrow/Pandas UDF registration parameters

Implicit Registration Syntax for UDFs

Implicit registration simplifies this process by using the Python runtime to automatically detect and register UDFs. Instead of embedding registration logic directly into the code, you can decorate your Python functions with @ decorators. Once decorated, these functions can be referenced in DataFrames using their original identifiers, seamlessly completing the registration. With implicit registration, there is no need to obtain a backend session object when applying the @ decorator. The backend session is only required later when working with Ibis DataFrames.

This approach is particularly useful when you aim to register UDFs without intrusive modifications to your code and do not require separation between the registration and usage of UDFs under the same Backend connection.

A common use case involves scenarios where your Python script includes both the registration and application of UDFs in a single workflow.

Table 2 Implicit registration syntax for UDFs

UDF Type

UDF Type (Secondary)

Code Entry

Reference

udf, udaf, udtf

python

@fabric.[udf | udaf | udtf].python(<Registration parameters>)

Python/PyArrow/Pandas UDF registration parameters

builtin

@fabric.[udf | udaf | udtf].builtin(<Registration parameters>)

Builtin UDF registration parameters

pyarrow

@fabric.udf.pyarrow(<Registration parameters>)

Python/PyArrow/Pandas UDF registration parameters

pandas

@fabric.udf.pandas(<Registration parameters>)

Python/PyArrow/Pandas UDF registration parameters

For implicit registration, the timing of the actual registration action differs depending on the DataFrame operation mode (Lazy or Eager).

As mentioned earlier in the Ibis official documentation, DataFrame operations are divided into Eager and Lazy modes, controlled by the ibis.options.interactive configuration item. By default, this is set to false, meaning all DataFrames operate in Lazy mode by default. For these two DataFrame execution modes, the timing of UDF registration varies as described below:

Table 3 DataFrame execution modes

ibis.options.interactive

DataFrame Execution Mode

UDF Registration Time

UDF Usage Time

False

Lazy

When the entire DataFrame calls the execute method

When the entire DataFrame calls the execute method

True

Eager

First use in DataFrame

Every use in DataFrame

Python/PyArrow/Pandas UDF Registration Parameters

Registering a Python/PyArrow/Pandas UDF involves registering an original Python function or class into the database.

Whether it is explicit or implicit registration, whether it is scalar UDF, UDAF, or UDTF, for registering Python, PyArrow, or Pandas type UDFs, you can currently pass in the following parameters:

Table 4 Python/PyArrow/Pandas UDF registration parameters

Registration Parameter

Description

Type

Default Value

name

Specifies the actual storage name of the UDF in the database.

str | None

None

database

Specifies the LakeFormation database where the UDF resides.

str | None

None

fn

Specifies the original Python function of the UDF.

Callable

None

signature

Specifies the UDF function signature and return value type.

fabric_data.ibis.common.annotations.Signature | None

None

replace (currently unavailable)

Specifies whether the UDF supports in-place modification.

bool

False

temporary (currently unavailable)

Specifies whether the UDF has a session-level lifecycle.

bool

False

if_not_exist (currently unavailable)

Specifies whether to skip errors for existing UDFs.

bool

False

strict

Specifies whether the UDF automatically filters NULL values.

bool

True

volatility

Specifies the stability of the UDF.

VolatilityType.VOLATILE | VolatilityType.STABLE | VolatilityType.IMMUTABLE

VolatilityType.VOLATILE

runtime_version (currently unavailable)

Specifies the Python version for executing the UDF.

str

sys.version_info

imports

Specifies the external code files on which the UDF depends.

List[str]

None

packages

Specifies the Python modules on which the UDF depends.

List[Union[str, module]]

None

register_type

Specifies the registration form of the UDF.

RegisterType.INLINE | RegisterType.STAGED

RegisterType.INLINE

comment

Specifies user comments for the UDF.

str | None

None

Precautions for Python/PyArrow/Pandas UDF Registration Parameters

  • For the imports parameter, only file paths in the same directory or subdirectories as the .py file containing the current Python function or class are allowed.
  • For the fn parameter, if fn is not in the .py file where the UDF is being registered, then the file path defining fn must also be added to the imports parameter, for example:
    from process import outer
    
    con = ibis.fabric.connect(...)
    
    # Register a UDAF.
    udf = con.udaf.python.register(
        outer(), # fn introduced externally
        imports=["process.py"] # Add the file path for fn
    )
  • The signature parameter is currently optional for you to provide. If you specify it, your input takes precedence over automatic inference. When no signature is provided by you, the system defaults to inferring the parameter and return value types automatically. For more details, refer to Type Inference of the signature Parameter.
    • When registering a PyArrow UDF, whether you provide a signature parameter, it always relies on PyArrowVector. For example:
      import fabric_data as fabric
      import pyarrow as pa
      import pyarrow.compute as pc
      
      # === With signature: Requires dependency on PyArrowVector. ===
      def calculate_sum(
          prices: pa.ChunkedArray,
          quantities: pa.ChunkedArray,
      ) -> pa.ChunkedArray:
          return pc.multiply(prices, quantities)
      
      con = ibis.fabric.connect(...)
      
      # Register a UDF.
      udf = con.udf.pyarrow.register(
          fn=calculate_sum
          signature=fabric.Signature(
              parameters=[
                  fabric.Parameter(name="price", annotation=fabric.PyarrowVector[float]),
                  fabric.Parameter(name="quantity", annotation=fabric.PyarrowVector[int]),
              ],
              return_annotation=fabric.PyarrowVector[float],
          ),
      )
      
      # === Without signature: Also requires dependency on PyArrowVector. ===
      def calculate_sum(
          prices: fabric.PyarrowVector[float],
          quantities: fabric.PyarrowVector[int],
      ) -> fabric.PyarrowVector[float]:
          return fabric.PyarrowVector[float](pc.multiply(prices, quantities))
      
      con = ibis.fabric.connect(...)
      
      # Register a UDF.
      udf = con.udf.pyarrow.register(
          fn=calculate_sum
      )
    • When registering a Pandas UDF, whether you provide a signature parameter, it always relies on PandasVector. For example:
      import fabric_data as fabric
      import pandas as pd
      
      # === With signature: Requires dependency on PandasVector. ===
      def calculate_sum(
          prices: pd.Series, 
          quantities: pd.Series,
      ) -> pd.Series:
          return pd.Series(prices * quantities, dtype=pd.Float64Dtype())
      
      con = ibis.fabric.connect(...)
      
      # Register a UDF.
      udf = con.udf.pandas.register(
          fn=calculate_sum
          signature=fabric.Signature(
              parameters=[
                  fabric.Parameter(name="price", annotation=fabric.PandasVector[float]),
                  fabric.Parameter(name="quantity", annotation=fabric.PandasVector[int]),
              ],
              return_annotation=fabric.PandasVector[float],
          ),
      )
      
      # === Without signature: Also requires dependency on PandasVector. ===
      def calculate_sum(
          prices: fabric.PandasVector[float], 
          quantities: fabric.PandasVector[int],
      ) -> fabric.PandasVector[float]:
          return fabric.PandasVector[float](prices * quantities, dtype=pd.Float64Dtype())
      
      con = ibis.fabric.connect(...)
      
      # Register a UDF.
      udf = con.udf.pandas.register(
          fn=calculate_sum
      )
  • For the volatility parameter, the meanings of the three enumeration types are:
    • VolatilityType.VOLATILE: Function results may change at any time.
    • VolatilityType.STABLE: For fixed inputs, the function's result does not change during a single scan.
    • VolatilityType.IMMUTABLE: The function always produces the same result for identical inputs.

    The volatility parameter does not impact the execution of function pushdown. Python UDFs can be pushed down to DNs regardless of whether they are classified as IMMUTABLE, STABLE, or VOLATILE.

  • If you do not specify the packages parameter:
    • For PyArrow UDFs, the PyArrow version installed in the backend environment is automatically used.
    • For Pandas UDFs, the Pandas version installed in the backend environment is automatically used.

Builtin UDF Registration Parameters

For Builtin UDFs, registering them simply means obtaining a handle for existing database functions—no actual registration occurs.

Whether it is explicit or implicit registration, whether it is scalar UDF, UDAF, or UDTF, for registering Builtin type UDFs, you can currently pass in the following parameters:

Table 5 Builtin UDF registration parameters

Registration Parameter

Description

Type

Default Value

name

Specifies the actual storage name of the UDF in the database.

str | None

None

database

Specifies the LakeFormation database where the UDF resides.

str | None

None

fn

Specifies the original Python function of the UDF.

Callable

None

signature

Specifies the UDF function signature and return value type.

ibis.common.annotations.Signature | None

None

Precautions for Builtin UDF Registration Parameters

The signature parameter is currently optional for you to provide. If you specify it, your input takes precedence over automatic inference. When no signature is provided by you, the system defaults to inferring the parameter and return value types automatically. For more details, refer to Type Inference of the signature Parameter.

Type Inference of the signature Parameter

For the signature parameter, you may choose to provide the parameter/return value types or omit them entirely.

  • If you supply the signature parameter, there is no requirement for the original Python function to utilize type hinting syntax. This enables immediate operational registration of the UDF.
  • Conversely, if the signature parameter is not provided, you are advised to use type hinting syntax within the original Python function, though this precludes immediate operational registration of the UDF.

A comparison of these approaches is summarized below.

Table 6 signature parameter descriptions

signature Parameter

Description

Require Original Python Function with Type Hint Syntax

Support Immediate REPL Operation

User omits passing value

Auto-deduction (recommended)

No, yet recommended usage

No

User specifies value

Specified value

No

Yes

Here, immediate operation pertains to the read-evaluate-print loop (REPL), commonly encountered in Python's interactive terminal environment.

Introduced in Python 3.5 via PEP 484, type hinting syntax involves appending a colon (:) followed by the type after the parameter name and indicating the return type post the parameter list using an arrow (->), exemplified as follows:

def greet(name: str) -> str:
    return f"Hello, {name}"
from typing import List, Dict, Optional

def process_data(data: List[int]) -> Dict[str, Optional[int]]:
    return {"max": max(data) if data else None}

For Python/PyArrow/Pandas UDFs, strict typing is mandated upon registration, requiring explicit specification of all parameter and return value types. If you fail to define these through the original Python function's type hinting, you must actively use the signature parameter to designate the Ibis DataType.

In contrast, Builtin UDFs do not enforce strict typing during registration (as the UDF is already registered in the database). If you cannot specify the type annotations of the original Python function, you are advised to include only the parameter names without their types. If you later use the return value of the Builtin UDF (excluding Top SELECT UDF), then the function's return type needs to be specified, and when necessary, you should actively use the signature parameter to define the Ibis DataType. If not needed (for Top SELECT UDF), you may omit writing the function's return type.

Regarding cases where you do not provide the signature parameter and rely on auto-deduction, the following summary applies:

Table 7 Auto-deduction of signature

Registered UDF Type

Parameter Type

Return Type

Python/Pyarrow/Pandas UDF

Requires type hinting syntax for specification.

Requires type hinting syntax for specification.

Builtin UDF

Allows writing just parameter names without types.

Requires type hinting syntax when utilizing return values subsequently. Otherwise, not mandatory.

For cases where you do not pass in the signature parameter and it is inferred automatically, the underlying implementation principle is inspect.signature. Currently, the system accepts the following parameter/return value types from you:

Table 8 Accepted parameter/return value types

Python

Ibis DataType

DataArts Fabric SQL

DataType

DataType

-

type(None)

null

NULL

bool

Boolean

BOOLEAN

bytes

Binary

BYTEA

str

String

TEXT

numbers.Integral

Int64

BIGINT

numbers.Real

Float64

DOUBLE PRECISION

decimal.Decimal

Decimal

DECIMAL

datetime.datetime

Timestamp

TIMESTAMP/TIMESTAMPTZ

datetime.date

Date

TIMESTAMP

datetime.time

Time

TIME

datetime.timedelta

Interval

INTERVAL

uuid.UUID

UUID

UUID

class

Struct

STRUCT

typing.Sequence, typing.Array

Array

ARRAY

typing.Mapping, typing.Map

Map

HSTORE

fabric_data.PyarrowVector[T]

T

T

fabric_data.PandasVector[T]

T

T

Notes:

  • The built-in int type of Python belongs to the subclass of numbers.Integral.
  • The built-in float type of Python belongs to the subclass of numbers.Real.

The Python types that are not listed in the preceding table are automatically converted types that are not supported currently.

For parameters/return values where you do not pass the signature parameter and also do not use Python type annotation (type hints) syntax, the current automatic inference adopts the following approach:

Table 9 Special parameter type handling

Parameter Type

Generated Matching Pattern

Pattern Effectiveness

POSITIONAL_ONLY, KEYWORD_ONLY, POSITIONAL_OR_KEYWORD

ValueOf(None)

Exempts from __signature__.validate.

VAR_POSITIONAL

TupleOf(pattern=pattern)

Executes pattern in a for-loop.

VAR_KEYWORD

DictOf(key_pattern=InstanceOf(str), value_pattern=pattern)

Executes pattern in a for-loop.

Return

ValueOf(Unknown)

Provides UnknowScaclar, UnknownColumn as UDF return values passed upward.

The classification of parameter types (Parameter.kind) by inspect.signature is as follows:

Table 10 inspect.signature parameter types

Parameter Type

Description

Example Code

Parameters Meeting Conditions

POSITIONAL_ONLY

Position-only parameter.

def func(a, /, b): pass

a

KEYWORD_ONLY

Keyword-only parameter.

def func(a, *, b): pass

b

POSITIONAL_OR_KEYWORD

Positional or keyword parameter.

def func(a, b): pass

a, b

VAR_POSITIONAL

Variable positional parameter.

def func(*args): pass

args

VAR_KEYWORD

Variable keyword parameter.

def func(**kwargs): pass

kwargs

Direct Operation Syntax for UDFs

In scenarios where registration and usage are separate, the direct operation syntax for scalar UDFs, UDAFs, and UDTFs is provided for you. You only need to know the UDF name (name) and the database name (database) where the UDF resides to directly use the UDF. The following operations rely on the UDF attribute of the backend session object.

signature(name, database=None)

Description: Returns the function signature and return value type of the UDF from the backend database.

Input parameters:

  • name (str): UDF name.
  • database (str): Name of the database the UDF belongs to.

Return type: fabric_data.ibis.common.annotations.Signature - Registered UDF's signature and return type.

get(name, database=None)

Description: Returns the UDF from the backend database.

Input parameters:

  • name (str): UDF name.
  • database (str): Name of the database the UDF belongs to.

Return type: Callable[..., ibis.expr.types.Value] - Registered UDFs.

names(database=None)

Description: Returns the names of all UDFs from the backend database.

Input parameters:

  • database (str): Name of the database the UDF belongs to.

Return type: List[str] - Names of all registered UDFs.

unregister(name, database=None)

Description: Deletes a specified UDF from the backend database.

Input parameters:

  • database (str): Name of the database the UDF belongs to.

Return type: None.

UDF WITH ARGUMENTS Syntax

Whether you use explicit registration syntax, implicit registration syntax to return a UDF operator, or directly operation syntax to access a registered UDF, all currently support passing arguments via the with_arguments method. These arguments fall into two categories:

  • Special-purpose parameters: These parameters configure runtime resources, concurrency, and execution time limits for the UDF. Examples include concurrency, timeout, dpu, and apu. Refer to UDF runtime configuration list for details. All types of UDFs currently allow you to pass these configuration parameters using with_arguments.
  • General parameters: These parameters initialize the state of the UDF during its setup phase. They can only be passed as scalar values if you have defined optional parameters in the __init__ method of class UDF, class UDTF, or UDAF. This allows for one-time initialization of internal states, which can then be reused multiple times.
Table 11 Applicability of WITH ARGUMENTS syntax

Parameter

Applicable UDF Type

Description

Special parameters (e.g., concurrency, timeout, dpu, apu)

All UDF types, including scalar UDF, class UDF, vectorized UDF, UDTF, UDAF

Configures runtime resources, concurrency, and execution time limits for the UDF.

General parameters defined by the Python Class's __init__ method

Class UDF types, including class UDF, class UDTF, UDAF

Sets up initial state for the UDF, suitable for cases involving internal caching, initialization parameters, or model objects.

All parameter values passed through the with_arguments method are scalar values. These values are collectively passed as a **kwargs dictionary in Python.