Help Center> Data Warehouse Service (DWS)> 3.0 Developer Guide> Introduction to GaussDB(DWS) 3.0
Updated on 2023-11-30 GMT+08:00

Introduction to GaussDB(DWS) 3.0

The newly released GaussDB(DWS) 3.0 version provides resource pooling, massive storage, and the MPP architecture with decoupled computing and storage. This enables high elasticity, real-time data import and sharing, and lake warehouse integration.

Description

GaussDB(DWS) 3.0 uses decoupled computing and storage, which enables independent scaling of compute and storage resources. This feature enables users to quickly and independently scale computing capabilities during peak and off-peak hours. Storage can be expanded without limitation and paid on-demand to quickly and agilely responds to service changes with higher cost-effectiveness.

GaussDB(DWS) 3.0 has the following advantages:

  • Lakehouse: GaussDB(DWS) 3.0 provides an integrated lakehouse that is easier to maintain and operate. It seamlessly interconnects with DLI, supports automatic metadata import, external table query acceleration, joined query of internal and external tables, data lake format read and write, and simpler data import.
  • High elasticity: Computing resources can be quickly scaled, storage space can be used on demand, greatly reducing the cost. Historical data does not need to be migrated to other storage media, enabling one-stop data analysis for industries such as finance and Internet.
  • Data sharing: Multiple loads share one copy of data in real time, while the computing resources are isolated. Multiple writes and reads are supported.

Architecture

Figure 1 GaussDB(DWS) 3.0 architecture
  • Serverless and cloud native
    • Decoupled storage, computing, and management layers; independent, flexible, and fast scaling of computing and storage resources
    • Cost-effective, meeting diverse workload requirements and strict load isolation requirements
  • Highly scalable
    • Logical clusters (virtual warehouses) can be scaled in or out in many ways.
    • Data is shared among multiple logical clusters in real time. Multiple loads share one copy of data.
    • Logical clusters are used to linearly improve throughput and concurrency, and provide good read/write isolation and load isolation capabilities.
  • Data lakehouse
    • Seamless hybrid query across data lakes and data warehouses
    • In data lake analysis, you can enjoy the ultimate performance and precise control of data warehouses.

Version Differences

Table 1 Differences between GaussDB(DWS) 3.0 and GaussDB(DWS) 2.0

Version

DWS 2.0

DWS 3.0

Application scenarios

Converged data analysis using OLAP. It is used in sectors such as finance, government and enterprise, e-commerce, and energy.

Converged analysis, and offline integrated OLAP analysis. Optimized for Internet scenarios.

Advantages

High cost-effectiveness

Tot and cold data analysis and elastic scaling of storage and computing resources.

Low cost and high concurrency.

Decoupled storage and compute, on-demand storage usage, rapid computing scaling, unlimited computing power, and unlimited capacity.

Data sharing and lake warehouse integration.

Features

Excellent performance in interactive analysis and offline processing of massive data, as well as complex data mining.

Real-time data import, real-time analysis, offline processing, interactive query, and high performance for large-scale data and complex data mining.

SQL syntax

Compatible with the SQL syntax of the cloud data warehouse.

Compatible with the SQL syntax of the cloud data warehouse.

GUC parameter

You can configure a wide variety of GUC parameters to tailor your data warehouse environment.

You can configure a wide variety of GUC parameters to tailor your data warehouse environment.

Application Scenarios

  • Data lakehouse

    Seamless access to the data lake

    • With the interconnection with Hive Metastore metadata management, you can directly access the data table definitions in the data lake. You do not need to create a foreign table. You only need to create an external schema.
    • The following data formats are supported: ORC and Parquet.

      Convergent query

    • Hybrid query of any data in the data lake and warehouse
    • The query result is directly sent to the warehouse or data lake. No data needs to be transferred or copied.

      Excellent query performance

    • High-quality query plans and efficient execution engines
    • Precise load management methods

  • Highly scalable

    Computing resources can be quickly scaled, storage space can be used on demand, greatly reducing cost. It is applicable to stable services and sensitive services.

    • Two scaling modes are provided. You can scale in or out the current cluster or add a logical cluster.
    • The scaling is performed very quickly without data redistribution or copy.
    • A logical cluster can improve concurrency and throughput. It can also be used to bind different services to different VWs to implement read/write isolation. It is applicable to scenarios where service loads change periodically, for example, batch service increase from 00:00 to 07:00.

  • Data sharing

    One copy of data carries various loads. Data can be shared in real time, and data of different services can be quickly shared.

    • Any logical cluster can carry read and write loads.
    • Data is visible shared among multiple logical clusters and does not need to be copied.