このページは、お客様の言語ではご利用いただけません。Huawei Cloudは、より多くの言語バージョンを追加するために懸命に取り組んでいます。ご協力ありがとうございました。

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
Help Center/ GaussDB(DWS)/ User Guide/ Creating a GaussDB(DWS) Cluster/ Before You Start: Performance Management Requirements

Before You Start: Performance Management Requirements

Updated on 2025-03-03 GMT+08:00

Effective performance management of the GaussDB(DWS) database system is vital for the entire system. To prevent frequent resource overload (such as CPU, I/O, memory, and disk space) in the cluster, it is important to control and limit the services and overall resources in the cluster. Regular proactive O&M and advance scale-out planning are also necessary.

Before introducing a new service, it is crucial to evaluate and conduct pressure tests on existing resources to avoid excessive resource consumption and negative impact on the overall cluster performance. As the data volume of existing services increases, the cluster's disk space and I/O usage also grow. Therefore, periodic clearance of aged and unnecessary data is required.

This section provides an overview of the cluster's performance baseline and outlines the performance management requirements in typical service scenarios. Its purpose is to assist users and O&M personnel in evaluating the cluster's capacity in advance and preventing resource overload.

GaussDB(DWS) Cluster Performance Baseline

In this section, you will find information about the recommended values and risk values of GaussDB(DWS) resources.

When the resource watermark exceeds the recommended value, it is crucial for O&M personnel to promptly address the issue to prevent performance degradation in scenarios such as node faults and active/standby switchover.

Exceeding the risk value for the cluster resource watermark indicates potential overload. In such cases, it is advisable to refrain from introducing new services.

Instead, it is necessary to swiftly reduce the overall cluster load through service optimization or scheduling tasks during off-peak hours. If needed, the cluster can be divided or its capacity expanded to ensure no impact on overall performance.

Table 1 Cluster Performance and Capacity Risks and Suggestions

Metric

Recommended Value

Impact of Exceeding the Recommended Value

Recommended Measure

Risk Value

Impact of Exceeding the Risk Value

Recommended Measure

CPU usage

Less than 60%

When the active/standby nodes are unbalanced or a node is faulty, the CPU usage of some nodes may be overloaded, causing performance degradation.

Configure a resource pool for resource isolation. For details, see GaussDB(DWS) Resource Load Management. Use Real-Time Queries and Performance Monitoring to capture statements with high CPU usage for service optimization. For details, see Monitoring and Diagnosing a GaussDB(DWS) Cluster and .

80%

Severe CPU contention occurs. As a result, the execution time of operators such as Stream deteriorates, and the overall cluster performance is severely affected.

Reduce the CPU load during peak hours by means of service staggering, service splitting, service optimization, and cluster scale-out.

You can also set the CPU limit and quota of the resource pool. For details, see advanced system tuning operations in Tuning Systems with High CPU Usage.

CPU skew

Less than 15%

Computing skew occurs. As a result, the optimal performance of some statements in the distributed system cannot be fully utilized.

Configure rules introducing in Exception Rules and circuit breakers to fallbreak skew statements in advance. Optimize such services on a daily basis.

30%

During peak hours, a single node's CPU may become overloaded, causing overall cluster performance to deteriorate due to Liebig's Law of the Minimum. This prevents other nodes from being fully utilized.

Configure rules introducing in Exception Rules and circuit breakers to preemptively handle skewed statements and optimize services regularly.

I/O usage

Less than 60%

When the active/standby status is unbalanced or a node fails, some nodes may experience I/O overload, leading to performance degradation.

Find out the services with high I/O usage by checking the monitoring data. For details, see Performance Monitoring. You can reduce the disk I/O usage by indexing, partition pruning, and row-column storage rectification.

90%

Severe I/O contention can occur, affecting operators such as table scanning and overall cluster performance.

Optimize high-I/O statements and stagger peak hours to maintain I/O performance. Plan for cluster scale-out in advance to reduce the I/O burden on individual nodes.

I/O read/write latency

Less than 400 milliseconds

Performance fluctuations during data read and write operations can lead to unstable query times and occasional performance degradation.

Find out the services with high I/O usage by checking the monitoring data. For details, see Performance Monitoring. You can reduce the disk I/O usage by indexing, partition pruning, and row-column storage rectification to reduce the read/write latency.

1000 ms

Significant deterioration in data read/write performance can cause real-time data storage services to back up, impacting overall performance.

Optimize high-I/O, high-disk, and high-concurrency statements to stagger service peaks and distribute the load more evenly.

Dynamic memory usage

Less than 80%

When the service traffic increases sharply or complex flexible queries are executed, an error may be reported due to insufficient memory.

Configure exception rules and memory circuit breaker. Optimize memory-intensive services by referring to Real-Time Queries and Monitoring and Diagnosing a GaussDB(DWS) Cluster.

For how to reduce the memory usage, see Reducing Memory Usage.

90%

CCN queuing occurs, an error indicating insufficient memory is reported, and process OOM risks exist.

Configure exception rules and memory circuit breaker. Optimize memory-intensive services by referring to Real-Time Queries and Monitoring and Diagnosing a GaussDB(DWS) Cluster.

Disk space usage

Less than 70%

The risk of read-only status increases when SQL statements are written to disks and the disk usage exceeds 90%.

Set thresholds for triggering disk flushing, clear data and dirty pages during off-peak hours, and plan for scale-out in advance.

For details, see Solution to High Disk Usage and Cluster Read-Only.

80%

The read-only risk increases after SQL statements are written to disks.

Set thresholds for triggering disk flushing, clear data and dirty pages during off-peak hours, and plan for scale-out in advance.

Disk space skew

Less than 15%

Severe skew occurs during operator computing or data spill to the disk. The workloads will be unevenly distributed on DNs, resulting in high disk usage on a single DN and affecting performance.

Check and handle table skews by referring to Table Diagnosis.

20%

Disk skew causes CPU, I/O, and memory skew, which affects the overall cluster performance and may cause the disk of a single DN to be full.

Handle table skews by referring to Table Diagnosis.

GaussDB(DWS) Performance Management Scenarios and Suggestions

This section introduces common performance management scenarios and offers suggestions. During service rollout and routine O&M, you need to thoroughly assess the performance capacity to avoid overloading the cluster.

Table 2 Performance management scenarios

Scenario

Performance Risk

Evaluation Method

Suggestion

New cluster rollout

The performance and capacity of the new cluster are uncertain before the service rollout, and there is a possibility that they may not meet the requirements.

Before launching the service, conduct a pressure test on the cluster. Both the new and old clusters should be operational for at least one service period. It is necessary to thoroughly test key services and links for performance metrics such as QPS, latency, maximum concurrency, and maximum response time. This will ensure a comprehensive evaluation of the performance and capacity of the new cluster.

Implement dynamic resource management and allocate service resource pools accordingly by referring to GaussDB(DWS) Resource Load Management. Configure exception rules in advance and configure circuit breaker parameters.

New service rollout

Resource preemption may arise, impacting existing services in the cluster. If new services are executed concurrently and consume resources improperly, it can result in resource overload and a decline in overall performance.

Conduct a thorough test on the new service in a test environment. Based on the test results, estimate the CPU usage, execution time, and number of concurrent services. Analyze the execution plan for the new services to ensure optimal performance.

Roll out a cluster only when the cluster's performance capacity is sufficient. Isolate new services with resource pools. Configure circuit breakers appropriately according to the test results and produce a rollback solution to swiftly revert services in the event of a fault.

Flexible query performance management

There are different types of SQL statements that offer flexibility in querying, but their execution efficiency and resource consumption can vary significantly. In extreme cases, a slow SQL statement can negatively impact the performance of the entire cluster.

To address this, you can gather statistics on CPU usage, memory usage, execution time, and the number of concurrent queries. For details, see Real-Time Queries.

For users who frequently use flexible queries, allocate them to separate resource pools that are independent of other services. This allows for better CPU and memory resource management. To promptly handle slow SQL statements, configure exception rules and circuit breakers. Remember to follow the Liebig's Law of the Minimum when granting permissions to these users. The administrator account should not be used as the primary account for flexible queries.

Inventory business increase

As services grow and more data is generated, the cluster's resource usage increases. If the cluster resources are not managed promptly, there may be a risk of overload.

Collect statistics on various metrics like dirty data, skew rate, ANALYZE time, number of partitions, and resource consumption of inventory services on a regular basis.

Inspect the cluster weekly, clearing dirty data from tables with a high dirty page rate, and performing ANALYZE on tables that have not had their statistics collected in a timely manner.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback