Updated on 2025-08-25 GMT+08:00

Execution Plan Operator

Operator Introduction

In a SQL execution plan, each step indicates a database operator, also called an execution operator. In DataArts Fabric SQL, operators serve as fundamental data processing units. Efficiently combining these operators and optimizing their order and execution methods can enhance data processing performance.

Operators in DataArts Fabric SQL are categorized into scan operators, control operators, materialization operators, join operators, and others.

Scan Operators

A scan operator scans data in a table, processing one tuple at a time for the upper-layer node. It operates at the leaf node of the query plan tree and can scan tables, result sets, linked lists, and subquery results. The following table lists common scan operators.

Table 1 Scan operators

Operator

Description

Scenario

VecForeignScan

Sequential scanning

Basic scan operator, used to scan external tables.

VecSubqueryScan

Subquery scanning

Another query plan tree (subplan) is used as the scanning object to scan tuples.

FunctionScan

Function scanning

FROM function_name

ValuesScan

Values linked list scanning

It scans the given tuple set in VALUES clauses.

Join Operators

A join operator corresponds to a join operation in relational algebra. Taking joining t1 table and t2 table as an example, the primary centralized join types include inner join, left join, right join, full join, semi join, and anti join, with implementations such as Nestloop, HashJoin, and MergeJoin.
Table 2 Join operators

Operator

Description

Scenario

Implementation Feature

VecNestLoop

Nested loop join, which is a brute force approach. It scans the inner table for each row.

Inner Join, Left Outer Join, Semi Join, Anti Join

It is used for queries that have a smaller subset connected. In nested loops, the outer table drives the inner table, and every row returned by the outer table must be searched in the inner table to find its matching row. Hence, the result set returned by the entire query should not be too large (not exceeding 10,000), and the table returning the smaller subset should serve as the outer table, with indexes preferably added to the connection fields of the inner table.

Vector Sonic HashJoin

Hash join: The inner and outer tables use the join column's hash value to create a hash table. Matching values are then stored in the same bucket. The two ends of an equal join must be of the same type and support hash.

Inner Join, Left Outer Join, Right Outer Join, Full Outer Join, Semi Join, Anti Join

Hash joins, appropriate for joining large datasets. The optimizer uses the smaller of the two tables to construct a hash table in memory using the join keys, then scans the larger table and probes the hash to find matches. While Sonic and non-Sonic hash joins have different internal structures, this does not impact the final result set.

Materialization Operators

Materialization operators are a class of nodes that can cache tuples. During execution, many extended physical operations can be performed only after all tuples are obtained, such as aggregation function operations and sorting without indexes. Therefore, materialization operators are required to cache all the tuples.
Table 3 Materialization operators

Operator

Description

Scenario

VecMaterial

Materialization

Caches the subnode result.

VecSort

Sorting

ORDER BY clause, which is used for join, group, and set operations and works with Unique.

VecAgg

Executes aggregate functions.

  1. Aggregate functions such as COUNT, SUM, AVG, MAX, and MIN.
  2. DISTINCT clause.
  3. UNION deduplication.
  4. GROUP BY clause.

VecWindowAgg

Window functions

WINDOW clause.

VecSetOp

Processing set operations

INTERSECT/INTERSECT ALL, EXCEPT/EXCEPT ALL

Control Operators

Control operators are a type of node that handles exceptional scenarios and executes custom workflows.
Table 4 Control operators

Operator

Description

Scenario

VecResult

Performing calculation directly

  1. Table scanning is not included.
  2. The INSERT statement contains only one VALUES clause.

VecModifyTable

INSERT/UPDATE/DELETE upper-layer node

INSERT, UPDATE, and DELETE

VecAppend

Appending

  1. UNION(ALL)
  2. Table inheritance

VecLimit

Processing the LIMIT clause

OFFSET ... LIMIT ...

Other Operators

Other operators include VecStream and RemoteQuery. There are three types of Stream operators: Gather stream, Broadcast stream, and Redistribute stream.

  • Gather stream: Each source node sends its data to the target node for aggregation.
  • Broadcast stream: A source node sends its data to N target nodes for calculation.
  • Redistribute stream: Each source node calculates the hash value of its data based on the join condition, distributes the data based on the hash value, and sends the data to the corresponding target node.
Table 5 Other Operators

Operator

Description

Scenario

VecStream

Multi-node data exchange

When a distributed query plan is executed, data is exchanged between nodes.

RowToVec

Rows-to-column conversion

Hybrid row-column.