Scalar UDF Types
For DataArts Fabric DataFrame, the following Scalar UDF types can currently be registered on the Python side:
Scalar UDF Type |
Input Type |
Vectorized |
Use Case and Feature |
---|---|---|---|
python |
Python scalar value |
No |
Processes data row by row, suitable for simple or specific calculations, but with lower performance. |
builtin |
Backend-supported types |
No |
Directly calls existing functions from the database backend, suitable for using native database functions. |
pandas |
pandas.Series |
Yes |
Uses Pandas' vectorized operations, ideal for performing complex data processing at the Python level. |
pyarrow |
pyarrow.Array |
Yes |
Leverages PyArrow's high-performance computing capabilities, perfect for handling large datasets or requiring efficient computations. |
For Scalar UDFs, only python and builtin types have been implemented so far. Future versions will see modifications and additions such as pyarrow and pandas types.
The overarching design principle of Scalar UDFs is that your own Python functions should operate correctly without database involvement. To get closer to raw data/pursue better performance, we utilize the database's UDF features while minimizing the effort needed for you to modify your original code when adopting UDFs.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot