هذه الصفحة غير متوفرة حاليًا بلغتك المحلية. نحن نعمل جاهدين على إضافة المزيد من اللغات. شاكرين تفهمك ودعمك المستمر لنا.

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
Help Center/ GaussDB(DWS)/ Service Overview/ What Is GaussDB(DWS)?

What Is GaussDB(DWS)?

Updated on 2024-12-27 GMT+08:00

GaussDB(DWS) is an online data analysis and processing database built on the Huawei Cloud infrastructure and platform. It offers scalable, ready-to-use, and fully managed analytical database services, and is compatible with ANSI/ISO SQL92, SQL99, and SQL 2003 syntax. Additionally, GaussDB(DWS) is interoperable with other database ecosystems such as PostgreSQL, Oracle, Teradata, and MySQL. This makes it a competitive option for petabyte-scale big data analytics across diverse industries.

GaussDB(DWS) offers both storage-compute coupled and decoupled data warehouses and helps you create a cutting-edge data warehouse that excels in enterprise-level kernels, real-time analysis, collaborative computing, convergent analysis, and cloud native capabilities. For details, see Data Warehouse Types.

  • Computing In-Memory(CIM): The storage-compute coupled data warehouse provides enterprise-level data warehouse services with high performance, high scalability, high reliability, high security, and easy O&M. It is capable of data analysis at a scale of 2,048 nodes and 20 petabytes of data and is suitable for converged analysis services that integrate databases, warehouses, marts, and lakes.
  • Decoupled Storage and Compute: The storage-compute decoupled data warehouse is designed with a cloud native architecture that separates storage and compute. It also features hierarchical auto scaling for computing and storage, as well as multi-logical cluster shared storage technology (Virtual Warehouse or VW). These capabilities allow for computing isolation and concurrent expansion to handle varying loads, making it an ideal choice for OLAP analysis scenarios.

GaussDB(DWS) is widely used in domains such as finance, Internet of Vehicles (IoV), government and enterprise, e-commerce, energy, and telecom. It has been listed in the Gartner Magic Quadrant for Data Management Solutions for Analytics for two consecutive years. Unlike conventional data warehouses, GaussDB(DWS) is more cost-effective and has large-scale scalability and enterprise-level reliability.

In addition, GaussDB(DWS) can be deployed on physical machines. For details, see Physical Machine Deployment.

Logical Cluster Architecture

Figure 1 shows the logical architecture of a GaussDB(DWS) cluster. For details about the instance, see Table 1.

Figure 1 Logical cluster architecture
Table 1 Cluster architecture description

Name

Function

Description

Cluster Manager (CM)

Cluster Manager. It manages and monitors the running status of functional units and physical resources in the distributed system, ensuring system stability.

The CM consists of CM Agent, OM Monitor, and CM Server.

  • CM Agent monitors the running status of primary and standby GTMs, CNs, and primary and standby DNs on the host, and reports the status to CM Server. In addition, it executes the arbitration instruction delivered by CM Server. A CM Agent process runs on each host.
  • OM Monitor monitors scheduled tasks of CM Agent and restarts CM Agent when CM Agent stops. If CM Agent cannot be restarted, the host cannot be used. In this case, manually rectify this fault.
    NOTE:

    CM Agent cannot be restarted probably because of insufficient system resources, which is not a common situation.

  • CM Server checks whether the current system is normal according to the instance status reported by CM Agent. In the case of exceptions, CM Server delivers recovery commands to CM Agent.

GaussDB(DWS) provides the primary/standby CM Server solution to ensure system HA. CM Agent connects to the primary CM Server. If the primary CM Server is faulty, the standby CM Server is promoted to primary to prevent a single point of failure (SPOF).

Global Transaction Manager (GTM)

Generates and maintains the globally unique information, such as the transaction ID, transaction snapshot, and timestamp.

The cluster includes only one pair of GTMs: one primary GTM and one standby GTM.

Workload Manager (WLM)

Workload Manager. It controls allocation of system resources to prevent service congestion and system crash resulting from excessive workload.

You do not need to specify names of hosts where WLMs are to be deployed, because the installation program automatically installs a WLM on each host.

Coordinator (CN)

A CN receives access requests from applications, and returns execution results to the client; splits tasks and allocates task fragments to different DNs for parallel processing.

CNs in a cluster have equivalent roles and return the same result for the same DML statement. Load balancers can be added between CNs and applications to ensure that CNs are transparent to applications. If a CN is faulty, the load balancer automatically connects the application to the other CN. For details, see Associating and Disassociating ELB.

CNs need to connect to each other in the distributed transaction architecture. To reduce heavy load caused by excessive threads on GTMs, no more than 10 CNs should be configured in a cluster.

GaussDB(DWS) handles the global resource load in a cluster using the Central Coordinator (CCN) for adaptive dynamic load management. When the cluster is started for the first time, the CM selects the CN with the smallest ID as the CCN. If the CCN is faulty, CM replaces it with a new one.

Datanode (DN)

A DN stores data in row-store, column-store, or hybrid mode, executes data query tasks, and returns execution results to CNs.

There are multiple DNs in the cluster. Each DN stores part of data. GaussDB(DWS) provides DN high availability: active DN, standby DN, and secondary DN. The working principles of the three are as follows:

  • During data synchronization, if the active DN suddenly becomes faulty, the standby DN is switched to the active state.
  • Before the faulty active DN recovers, the new active DN synchronizes data logs to the secondary DN.
  • After the faulty active DN recovers, it becomes the standby DN and uses data logs stored on the secondary DN to restore data generated during its faulty period.

The secondary DN serves exclusively as a backup, never ascending to active or standby status in case of faults. It conserves storage by only holding Xlog data transferred from the new active DN and data replicated during original active DN failures. This efficient approach saves one-third of the storage space compared to conventional tri-backup methods.

Storage

Functions as the server's local storage resources to store data permanently.

-

DNs in a cluster store data on disks. Figure 2 describes the objects on each DN and the relationships among them logically.

  • A database manages various data objects and is isolated from other databases.
  • A datafile segment stores data in only one table. A table containing more than 1 GB of data is stored in multiple data file segments.
  • A table belongs only to one database.
  • A block is the basic unit of database management, with a default size of 8 KB.

Data can be distributed in replication, round-robin, or hash mode. You can specify the distribution mode during table creation.

Figure 2 Logical database architecture

Physical Architecture of a Cluster

GaussDB(DWS) supports the storage-compute coupled and decoupled architectures.

In the storage and compute coupled architecture, data is stored on local disks of DNs. In the storage-compute decoupled architecture, local DN disks are used only for data cache and metadata storage, and user data is stored on OBS. You can select an architecture as required.

Figure 3 Architecture selection

Storage-Compute Coupled Architecture

GaussDB(DWS) employs the shared-nothing architecture and the massively parallel processing (MPP) engine, and consists of numerous independent logical nodes that do not share the system resources such as CPUs, memory, and storage. In such a system architecture, service data is separately stored on numerous nodes. Data analysis tasks are executed in parallel on the nodes where data is stored. The massively parallel data processing significantly improves response speed.

Figure 4 Architecture
  • Application layer

    Data loading tools, extract, transform, and load (ETL) tools, business intelligence (BI) tools, as well as data mining and analysis tools, can be integrated with GaussDB(DWS) through standard APIs. GaussDB(DWS) is compatible with the PostgreSQL ecosystem, and the SQL syntax is compatible with Oracle, MySQL, and Teradata. Applications can be smoothly migrated to GaussDB(DWS) with few changes.

  • API

    Applications can connect to GaussDB(DWS) through standard JDBC and ODBC.

  • GaussDB(DWS)

    A GaussDB(DWS) cluster contains nodes of the same flavor in the same subnet. These nodes jointly provide services. Datanodes (DNs) in a cluster store data on disks. CNs, or Coordinators, receive access requests from the clients and return the execution results. They also split and distribute tasks to the Datanodes (DNs) for parallel execution.

  • Automatic data backup

    Cluster snapshots can be automatically backed up to the EB-level Object Storage Service (OBS), which facilitates periodic backup of the cluster during off-peak hours, ensuring data recovery after a cluster exception occurs.

    A snapshot is a complete backup of GaussDB(DWS) at a specified time point. It records all configuration data and service data of the cluster at the specified moment.

  • Tool chain

    The parallel data loading tool General Data Service (GDS), SQL syntax migration tool Database Schema Convertor (DSC), and SQL development tool Data Studio are provided. The cluster O&M can be monitored on a console.

Storage-Compute Decoupled Architecture

The newly released GaussDB(DWS) storage-compute decoupled cluster provides resource pooling, massive storage, and the MPP architecture with decoupled compute and storage. This enables high elasticity, real-time data import and sharing, and lake warehouse integration.

The GaussDB(DWS) storage-compute decoupled cluster enables independent scaling of compute and storage resources by separating compute and storage functionalities. Users can easily adjust their computing capabilities during peak and off-peak hours. Additionally, storage can be expanded limitlessly and paid for on-demand, allowing for quick and flexible responses to service changes while maintaining cost-effectiveness.

The GaussDB(DWS) storage-compute decoupled cluster has the following advantages:

  • Lakehouse: It simplifies the maintenance and operation of an integrated lakehouse. It seamlessly integrates with DLI, supports automatic metadata import, accelerates external table queries, enables joined queries of internal and external tables, and allows for reading and writing of data lake formats, as well as easier data import.
  • Real-time write: It provides the H-Store storage engine which optimizes real-time data writes and supports high-throughput real-time batch writes and updates.
  • High elasticity: Scaling compute resources and using on-demand storage can result in significant cost savings. Historical data does not need to be migrated to other storage media, enabling one-stop data analysis for industries such as finance and Internet.
  • Data sharing: Multiple loads share one copy of data in real time, while the computing resources are isolated. Multiple writes and reads are supported.
Figure 5 Storage-compute decoupled architecture

  • Superb scalability
    • Logical clusters, known as Virtual Warehouses (VWs), can be expanded concurrently based on service requirements.
    • Data is shared among multiple VWs in real-time, eliminating the need for data duplication.
    • Multiple VWs enhance throughput and concurrency while providing excellent read/write and load isolation.
  • Lakehouse
    • Seamless hybrid query across data lakes and data warehouses
    • In data lake analysis, you can enjoy the ultimate performance and precise control of data warehouses.

Comparison Between Storage-Compute Coupled and Decoupled Architectures

Table 2 Differences between storage-compute coupled and decoupled architectures

Version

Coupled storage and compute

Decoupled storage and compute

Storage medium

Data is stored on local disks of compute nodes.

Column-store data is stored in Huawei Cloud OBS. Local disks are used as the query cache of OBS data. Row-store data is still stored in local disks of compute nodes.

Advantage

Data is stored locally on compute nodes, providing high performance.

The architecture separates storage and compute, offering layered elasticity, on-demand storage use, rapid compute scaling, unlimited computing power, and capacity.

Data stored on object storage reduces costs and multiple VWs support higher concurrency.

Data sharing and lakehouse integration.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback