Updated on 2025-09-18 GMT+08:00

Python UDF Performance Tuning

DataArts Fabric SQL allows you to configure resource specifications and concurrency for Python UDF runtime, and collect critical performance metrics during execution.

UDF Performance Monitoring

You can run the explain performance command to print the performance metrics of the UDF Actor. As illustrated in Figure 1, it gathers statistics on the performance of the UDF Actor associated with the function calculate_0311 across each DN. The following uses three lines of performance data as an example.

  • In executor0_es_group (UDF:calculate_0311 time=13406.2 rows=450 loops=1):
    • executor0_es_group indicates the DN ID.
    • time indicates the total runtime of the UDF.
    • rows indicates the amount of data processed on this DN.
    • loops indicates the number of times that the UDF was called.
  • In executor0_es_group [actor 0](Actor: invoke=2485.6 close=5.4 rtt=[2611.1, 8495.8, 5] send=[11.1, 25.1, 5] streamCreate=57.7 streamClose=450.3), details about the 0th UDF Actor are provided:
    • invoke indicates the time required for starting an actor.
    • close indicates the time required for closing an actor.
    • rtt indicates the end-to-end processing time of each miniBatch, with a minimum time of 2611.1 ms and a maximum time of 8495.8 ms, totaling five miniBatches processed.
    • send indicates the time taken by the data sender. That is, the data was sent five times with a minimum duration of 11.1 ms and a maximum duration of 25.1 ms.
    • streamCreate indicates the time required for creating a stream for data transmission between the DN and UDF Actor, and streamClose indicates the time required for closing it.
  • In executor0_es_group [actor 0](MiniBatch: rows=[50, 50, 250] bytes=[13.19MB, 14.67MB, 69.89MB] totalExecuteTime=10453.2):
    • minRows and maxRow indicate the minimum and maximum number of rows in a miniBatch, respectively.
    • totalRows indicates the total amount of data processed by an actor.
    • minBytes and maxBytes indicate the minimum and maximum data sizes of a miniBatch, respectively. While totalBytes indicates the total data volume processed by an actor.
Figure 1 Performance metrics of Python UDF actors

As illustrated in the figure below, Summary displays the summary information about all UDF Actors in the query.

  • Actor Distribution Info indicates the actor distribution information. As shown in the figure, actor 0 and actor 1 are distributed on the host whose IP address is 10.42.0.78.
  • Actor Stream topo create time indicates the minimum and maximum time for creating a stream.
  • Actor Consumer close time indicates the minimum and maximum duration for closing a consumer.
  • Actor Stream Producer/Consumer count indicates the total number of times that the data stream is transmitted between the consumer and producer.
Figure 2 Overview of data stream transmission information of Python UDF actors