Updated on 2025-12-19 GMT+08:00

UDAF

The overall development process for UDAF can be modeled after scalar UDFs, with the following key considerations:

  • The handler must specify a Python class that includes four essential methods: accumulate, aggregate_state, merge, and finish. Refer to Specific instance methods when registering a UDAF for detailed explanations of these methods.
  • The accumulate, aggregate_state, and merge methods implemented by you must be side-effect-free. This means their outputs should depend solely on input parameters and the current aggregation state, without being influenced by factors such as invocation count, order, or external conditions.
  • The finish method of the UDAF will be called globally only once for each group (specified by the GROUP BY clause), regardless of whether it is executed in single-CN mode or multi-DN mode.
  • When defining a non-STRICT UDAF, you must ensure the accumulate method correctly handles NULL values (represented as None in Python).
  • UDAFs cannot be used as window functions. That is, they cannot be used within an OVER clause.
  • UDAFs do not support concurrent execution or vectorized computation.

Example

  1. Create a UDAF.
    CREATE AGGREGATE FUNCTION pysum(x INT)
    RETURNS INT
    STRICT
    LANGUAGE PYTHON
    RUNTIME_VERSION='3.11.2'
    HANDLER = 'CalculateSum'
    AS $$
    import typing
    class CalculateSum:
        def __init__(self) -> None:
            self.sum = 0
    
        def accumulate(self, x: int) -> None:
            self.sum += x
    
        def finish(self) -> int:
            return self.sum
    
        @property
        def aggregate_state(self) -> typing.Dict[str, typing.Any]:
            return {
                "sum": self.sum,
            }
    
        def merge(self, other_state: typing.Dict[str, typing.Any]) -> None:
            self.sum += other_state.get("sum")
    $$;
  2. Execute the UDAF.
    SELECT pysum(x) FROM UNNEST(ARRAY[1, 2, NULL, 3]) AS x;
     pysum 
    -------
         6
    (1 row)