Updated on 2025-12-19 GMT+08:00

UDTF

A user-defined table function (UDTF) processes one input row and generates multiple output rows. This makes it ideal for multimodal data processing scenarios like video/audio frame splitting or sample generation. For detailed definitions and usage constraints, refer to UDTF.

Example

The example below demonstrates how to register and use a UDTF.

  1. Create a session.
    import ibis
    import fabric_data as fabric
    import logging
    # create session
    con = ai_lake.connect(...)
  2. Define and register a UDTF.
    # function definition
    def py_generate_series(start, endnum, step):
        current = start
        while current <= endnum:
            yield {'col1':current}
            current += step
    
    # register udtf
    import ibis.expr.datatypes as dt
    func_signature=fabric.Signature(
         parameters=[
             fabric.Parameter(name="start", annotation=int),
             fabric.Parameter(name="endnum", annotation=int),
             fabric.Parameter(name="step", annotation=int)
         ],
         return_annotation=dt.Struct({"col1": int}),
    )
    target_database = 'test'
    con.create_table_function(py_generate_series, database=target_database, signature=func_signature)
  3. Invoke the UDTF.
    # use udtf
    ds = con.load_dataset("demo_table", database=target_database)
    py_generate_series_handler = con.get_function("py_generate_series", database=target_database)
    ds = ds.flat_map(
       fn=py_generate_series_handler,
       on=[ds.a, ds.b, ds.c],
       as_col='udtf_col'
    )
    
    # trigger executing
    res = ds.execute()
    print(res)
    | a     | b     | c     | udtf_col |
    | ----- | ----- | ----- | -------- |
    | int64 | int64 | int64 | int64    |
    | ----- | ----- | ----- | -------- |
    | 1     | 4     | 1     | 1        |
    | 1     | 4     | 1     | 2        |
    | 1     | 4     | 1     | 3        |
    | 1     | 4     | 1     | 4        |