Scalar UDF Types

For DataArts Fabric DataFrame, the following Scalar UDF types can currently be registered on the Python side:

**Table 1** Scalar UDF types
Scalar UDF Type	Input Type	Vectorized	Use Case and Feature
python	Python scalar value	No	Processes data row by row, suitable for simple or specific calculations, but with lower performance.
builtin	Backend-supported types	No	Directly calls existing functions from the database backend, suitable for using native database functions.
pandas	pandas.Series	Yes	Uses Pandas' vectorized operations, ideal for performing complex data processing at the Python level.
pyarrow	pyarrow.Array	Yes	Leverages PyArrow's high-performance computing capabilities, perfect for handling large datasets or requiring efficient computations.

For Scalar UDFs, only python and builtin types have been implemented so far. Future versions will see modifications and additions such as pyarrow and pandas types.

The overarching design principle of Scalar UDFs is that your own Python functions should operate correctly without database involvement. To get closer to raw data/pursue better performance, we utilize the database's UDF features while minimizing the effort needed for you to modify your original code when adopting UDFs.

Parent topic: Registering a Scalar UDF Using the DataFrame API

Previous topic: Registering a Scalar UDF Using the DataFrame API

Next topic: Overview