更新时间:2025-12-10 GMT+08:00
分享

Vectorized UDF

Vectorized UDF是向量化执行的函数,是为解决传统行式UDF性能瓶颈而设计的高效函数。入参、出参通常为PyArrow或者Pandas类型。

示例

下文提供两个示例展示如何使用Vectorized UDF。

  • 示例一:使用Pyarrow进行向量化加速。
    import fabric_data as fabric
    from fabric_data.udf import RegisterType
    
    import pyarrow.compute as pc
    
    # 隐式注册UDF
    @fabric.udf.pyarrow(database="your-database", register_type=RegisterType.STAGED)
    def calculate_product(prices: fabric.PyarrowVector[float], quantities: fabric.PyarrowVector[int]) -> fabric.PyarrowVector[float]:    
        return fabric.PyarrowVector[float](pc.multiply(prices, quantities))
    
    # 使用UDF
    con = ibis.fabric.connect(...)
    t = con.table("your-table", database="your-database")
    expression = t.select(calculate_product(t.price, t.quantity).name("product column"))
    print(expression.execute())
  • 示例二:使用Pandas进行向量化加速。
    import fabric_data as fabric
    from fabric_data.udf import RegisterType
    
    import pandas as pd
    
    # 隐式注册UDF
    @fabric.udf.pandas(database="your-database", register_type=RegisterType.STAGED)
    def calculate_product(prices: fabric.PandasVector[float], quantities: fabric.PandasVector[int]) -> fabric.PandasVector[float]:    
        return fabric.PandasVector[float](prices * quantities, dtype=pd.Float64Dtype())
    
    # 使用UDF
    con = ibis.fabric.connect(...)
    t = con.table("your-table", database="your-database")
    expression = t.select(calculate_product(t.price, t.quantity).name("product column"))
    print(expression.execute())

相关文档