Updated on 2025-08-25 GMT+08:00

Scalar UDF Types

For DataArts Fabric DataFrame, the following Scalar UDF types can currently be registered on the Python side:

Table 1 Scalar UDF types

Scalar UDF Type

Input Type

Vectorized

Use Case and Feature

python

Python scalar value

No

Processes data row by row, suitable for simple or specific calculations, but with lower performance.

builtin

Backend-supported types

No

Directly calls existing functions from the database backend, suitable for using native database functions.

pandas

pandas.Series

Yes

Uses Pandas' vectorized operations, ideal for performing complex data processing at the Python level.

pyarrow

pyarrow.Array

Yes

Leverages PyArrow's high-performance computing capabilities, perfect for handling large datasets or requiring efficient computations.

For Scalar UDFs, only python and builtin types have been implemented so far. Future versions will see modifications and additions such as pyarrow and pandas types.

The overarching design principle of Scalar UDFs is that your own Python functions should operate correctly without database involvement. To get closer to raw data/pursue better performance, we utilize the database's UDF features while minimizing the effort needed for you to modify your original code when adopting UDFs.