Updated on 2025-09-18 GMT+08:00

Example of Directly Using DataFrame with Scalar UDFs

Scenario

In big data processing scenarios, when users utilize DataFrame for data processing, they often need to use user-defined functions (UDFs) to achieve complex data computation logic. However, in the current system, UDF registration and invocation are tightly coupled, preventing users from independently viewing or deleting registered UDFs post-registration. This creates numerous inconveniences during collaborative development or dynamic management of UDFs within teams. To address this issue, this requirement introduces new Backend.udf series APIs, enabling users to dynamically view, call, and delete UDFs at runtime, thereby enhancing UDF management flexibility and development efficiency.

Constraints

Constraints on directly calling, viewing, and deleting UDFs are as follows:

Users must first establish a Backend (Fabric) connection before calling the Backend UDF Registry API.

Support for specific types relies on DataArts Fabric kernel's support for complex types.

In scenarios where registration and usage are decoupled, users are allowed to directly acquire, view, and delete Scalar UDFs through the Backend.udf API provided by DataArts Fabric backend, as illustrated below:
import ibis
import ibis_fabric as fabric

con = ibis.fabric.connect(...)

# View the list of existing UDFs in the database.
udfs = con.udf.names(database="your-database")

if "transform_json" in udfs:
    # Directly acquire the UDF and confirm that the transform_json function already exists in the database.
    transform_json_udf = con.udf.get(name="transform_json", database="your-database")
    # Use transform_json with the SELECT method of DataFrame.
    expression = t.select(transform_json_udf(t.ts, t.msg).name("json column"))
    df = expression.execute()
    # Delete a UDF.
    con.udf.unregister("transform_json", database="your-database")

if "SPManager" in udfs:
    # Directly acquire the UDF and confirm that the SPManager class already exists in the database.
    sentencepiece_udf = con.udf.get(name="SPManager", database="your-database")
    # Use SPManager with the SELECT method of DataFrame.
    expression = t.select(sentencepiece_udf(t.data).with_arguments(model_file="test_model.model", bos=True, eos=True).name("pieces column"))
    df = expression.execute()
    # Delete a UDF.
    con.udf.unregister("SPManager", database="your-database")

For details about the complete Scalar UDF operation syntax, see Scalar UDF Direct Operation Syntax.