更新时间:2025-12-10 GMT+08:00
Vectorized UDF
Vectorized UDF是向量化执行的函数,是为解决传统行式UDF性能瓶颈而设计的高效函数。入参、出参通常为PyArrow或者Pandas类型。
示例
下文提供两个示例展示如何使用Vectorized UDF。
- 示例一:使用Pyarrow进行向量化加速。
import fabric_data as fabric from fabric_data.udf import RegisterType import pyarrow.compute as pc # 隐式注册UDF @fabric.udf.pyarrow(database="your-database", register_type=RegisterType.STAGED) def calculate_product(prices: fabric.PyarrowVector[float], quantities: fabric.PyarrowVector[int]) -> fabric.PyarrowVector[float]: return fabric.PyarrowVector[float](pc.multiply(prices, quantities)) # 使用UDF con = ibis.fabric.connect(...) t = con.table("your-table", database="your-database") expression = t.select(calculate_product(t.price, t.quantity).name("product column")) print(expression.execute()) - 示例二:使用Pandas进行向量化加速。
import fabric_data as fabric from fabric_data.udf import RegisterType import pandas as pd # 隐式注册UDF @fabric.udf.pandas(database="your-database", register_type=RegisterType.STAGED) def calculate_product(prices: fabric.PandasVector[float], quantities: fabric.PandasVector[int]) -> fabric.PandasVector[float]: return fabric.PandasVector[float](prices * quantities, dtype=pd.Float64Dtype()) # 使用UDF con = ibis.fabric.connect(...) t = con.table("your-table", database="your-database") expression = t.select(calculate_product(t.price, t.quantity).name("product column")) print(expression.execute())
父主题: 用户自定义函数