Python UDF Performance Tuning
DataArts Fabric SQL allows you to configure resource specifications and concurrency for Python UDF runtime, and collect critical performance metrics during execution.
UDF Performance Monitoring
You can run the explain performance command to print the performance metrics of the UDF Actor. As illustrated in Figure 1, it gathers statistics on the performance of the UDF Actor associated with the function calculate_0311 across each DN. The following uses three lines of performance data as an example.
- In executor0_es_group (UDF:calculate_0311 time=13406.2 rows=450 loops=1):
- executor0_es_group indicates the DN ID.
- time indicates the total runtime of the UDF.
- rows indicates the amount of data processed on this DN.
- loops indicates the number of times that the UDF was called.
- In executor0_es_group [actor 0](Actor: invoke=2485.6 close=5.4 rtt=[2611.1, 8495.8, 5] send=[11.1, 25.1, 5] streamCreate=57.7 streamClose=450.3), details about the 0th UDF Actor are provided:
- invoke indicates the time required for starting an actor.
- close indicates the time required for closing an actor.
- rtt indicates the end-to-end processing time of each miniBatch, with a minimum time of 2611.1 ms and a maximum time of 8495.8 ms, totaling five miniBatches processed.
- send indicates the time taken by the data sender. That is, the data was sent five times with a minimum duration of 11.1 ms and a maximum duration of 25.1 ms.
- streamCreate indicates the time required for creating a stream for data transmission between the DN and UDF Actor, and streamClose indicates the time required for closing it.
- In executor0_es_group [actor 0](MiniBatch: rows=[50, 50, 250] bytes=[13.19MB, 14.67MB, 69.89MB] totalExecuteTime=10453.2):
- minRows and maxRow indicate the minimum and maximum number of rows in a miniBatch, respectively.
- totalRows indicates the total amount of data processed by an actor.
- minBytes and maxBytes indicate the minimum and maximum data sizes of a miniBatch, respectively. While totalBytes indicates the total data volume processed by an actor.
As illustrated in the figure below, Summary displays the summary information about all UDF Actors in the query.
- Actor Distribution Info indicates the actor distribution information. As shown in the figure, actor 0 and actor 1 are distributed on the host whose IP address is 10.42.0.78.
- Actor Stream topo create time indicates the minimum and maximum time for creating a stream.
- Actor Consumer close time indicates the minimum and maximum duration for closing a consumer.
- Actor Stream Producer/Consumer count indicates the total number of times that the data stream is transmitted between the consumer and producer.

Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot