Introduction to GaussDB(DWS) 3.0

The newly released GaussDB(DWS) 3.0 version provides resource pooling, massive storage, and the MPP architecture with decoupled computing and storage. This enables high elasticity, real-time data import and sharing, and lake warehouse integration.

Description

GaussDB(DWS) 3.0 uses decoupled computing and storage, which enables independent scaling of compute and storage resources. This feature enables users to quickly and independently scale computing capabilities during peak and off-peak hours. Storage can be expanded without limitation and paid on-demand to quickly and agilely responds to service changes with higher cost-effectiveness.

GaussDB(DWS) 3.0 has the following advantages:

Lakehouse: GaussDB(DWS) 3.0 provides an integrated lakehouse that is easier to maintain and operate. It seamlessly interconnects with DLI, supports automatic metadata import, external table query acceleration, joined query of internal and external tables, data lake format read and write, and simpler data import.
High elasticity: Computing resources can be quickly scaled, storage space can be used on demand, greatly reducing the cost. Historical data does not need to be migrated to other storage media, enabling one-stop data analysis for industries such as finance and Internet.
Data sharing: Multiple loads share one copy of data in real time, while the computing resources are isolated. Multiple writes and reads are supported.

Architecture

Figure 1 GaussDB(DWS) 3.0 architecture
Click to enlarge

Serverless and cloud native
- Decoupled storage, computing, and management layers; independent, flexible, and fast scaling of computing and storage resources
- Cost-effective, meeting diverse workload requirements and strict load isolation requirements
Highly scalable
- Logical clusters (virtual warehouses) can be scaled in or out in many ways.
- Data is shared among multiple logical clusters in real time. Multiple loads share one copy of data.
- Logical clusters are used to linearly improve throughput and concurrency, and provide good read/write isolation and load isolation capabilities.
Data lakehouse
- Seamless hybrid query across data lakes and data warehouses
- In data lake analysis, you can enjoy the ultimate performance and precise control of data warehouses.

Version Differences

**Table 1** Differences between GaussDB(DWS) 3.0 and GaussDB(DWS) 2.0
Version	DWS 2.0	DWS 3.0
Application scenarios	Converged data analysis using OLAP. It is used in sectors such as finance, government and enterprise, e-commerce, and energy.	Converged analysis, and offline integrated OLAP analysis. Optimized for Internet scenarios.
Advantages	High cost-effectiveness Tot and cold data analysis and elastic scaling of storage and computing resources.	Low cost and high concurrency. Decoupled storage and compute, on-demand storage usage, rapid computing scaling, unlimited computing power, and unlimited capacity. Data sharing and lake warehouse integration.
Features	Excellent performance in interactive analysis and offline processing of massive data, as well as complex data mining.	Real-time data import, real-time analysis, offline processing, interactive query, and high performance for large-scale data and complex data mining.
SQL syntax	Compatible with the SQL syntax of the cloud data warehouse.	Compatible with the SQL syntax of the cloud data warehouse.
GUC parameter	You can configure a wide variety of GUC parameters to tailor your data warehouse environment.	You can configure a wide variety of GUC parameters to tailor your data warehouse environment.

Application Scenarios

Data lakehouse
Seamless access to the data lake
- With the interconnection with Hive Metastore metadata management, you can directly access the data table definitions in the data lake. You do not need to create a foreign table. You only need to create an external schema.
- The following data formats are supported: ORC and Parquet.
  Convergent query
- Hybrid query of any data in the data lake and warehouse
- The query result is directly sent to the warehouse or data lake. No data needs to be transferred or copied.
  Excellent query performance
- High-quality query plans and efficient execution engines
- Precise load management methods
Highly scalable
Computing resources can be quickly scaled, storage space can be used on demand, greatly reducing cost. It is applicable to stable services and sensitive services.
- Two scaling modes are provided. You can scale in or out the current cluster or add a logical cluster.
- The scaling is performed very quickly without data redistribution or copy.
- A logical cluster can improve concurrency and throughput. It can also be used to bind different services to different VWs to implement read/write isolation. It is applicable to scenarios where service loads change periodically, for example, batch service increase from 00:00 to 07:00.
Data sharing
One copy of data carries various loads. Data can be shared in real time, and data of different services can be quickly shared.
- Any logical cluster can carry read and write loads.
- Data is visible shared among multiple logical clusters and does not need to be copied.