Updated on 2022-12-16 GMT+08:00

Fully Parallel Query

Description

Fully parallel distributed query processing is the core technology of GaussDB(DWS). It minimizes data flow between nodes during query, thereby making query more efficient.

To efficiently analyze data, GaussDB(DWS) employs a set of high-performance distributed executors, which input the execution plan generated by the SQL engine, process tuples based on the execution plan, and return the result to the client.

Technical Principles

Figure 1 demonstrates the fully parallel distributed query technology of GaussDB(DWS).

Figure 1 Fully parallel distributed query processing technology
  • The distributed executor running on the CN provides schedules and distributes executions.
  • A new execution operator is introduced to support data flow between DNs. The new operator is called the data flow operator. Data flow can be classified into Gather, Broadcast, and Redistribution flows, based on the relationship between input and output. Gather combines multiple query fragments of data into one. Broadcast forwards the data of one query fragment to multiple query fragments. Redistribution reorganizes the data of multiple query fragments and then redistributes the reorganized data to multiple query fragments.
  • Data transmission between DNs depends on the data stream topology constructed using the data distribution and the cost model during query and analysis. By the data stream topology, network connections are set up between DNs to drive data stream on the data flowing topology.