Fully Parallel Query
Description
Fully parallel distributed query processing is the core technology of GaussDB(DWS). It minimizes data flow between nodes during query, thereby making query more efficient.
To efficiently analyze data, GaussDB(DWS) employs a set of high-performance distributed executors, which input the execution plan generated by the SQL engine, process tuples based on the execution plan, and return the result to the client.
Technical Principles
Figure 1 demonstrates the fully parallel distributed query technology of GaussDB(DWS).
- The distributed executor running on the CN provides schedules and distributes executions.
- A new execution operator is introduced to support data flow between DNs. The new operator is called the data flow operator. Data flow can be classified into Gather, Broadcast, and Redistribution flows, based on the relationship between input and output. Gather combines multiple query fragments of data into one. Broadcast forwards the data of one query fragment to multiple query fragments. Redistribution reorganizes the data of multiple query fragments and then redistributes the reorganized data to multiple query fragments.
- Data transmission between DNs depends on the data stream topology constructed using the data distribution and the cost model during query and analysis. By the data stream topology, network connections are set up between DNs to drive data stream on the data flowing topology.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.