หน้านี้ยังไม่พร้อมใช้งานในภาษาท้องถิ่นของคุณ เรากำลังพยายามอย่างหนักเพื่อเพิ่มเวอร์ชันภาษาอื่น ๆ เพิ่มเติม ขอบคุณสำหรับการสนับสนุนเสมอมา

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

DWS_2000000006 Node Data Disk Usage Exceeds the Threshold

Updated on 2025-03-03 GMT+08:00

Description

GaussDB(DWS) collects the usage of all disks on each node in a cluster every 30 seconds.

  • If the maximum disk usage in the last 10 minutes (configurable) exceeds 80% (configurable), a major alarm is reported. If the average disk usage is lower than 75% (that is, the alarm threshold minus 5%), this major alarm is cleared.
  • If the maximum disk usage in the last 10 minutes (configurable) exceeds 85% (configurable), a critical alarm is reported. If the average disk usage is lower than 85% (that is, the alarm threshold minus 5%), this critical alarm is cleared.
NOTE:

If the maximum disk usage is always greater than the alarm threshold, the system generates an alarm again 24 hours later (configurable).

Attributes

Alarm ID

Alarm Category

Alarm Severity

Alarm Type

Service Type

Auto Cleared

DWS_2000000006

Management plane alarm

Urgent: > 85%; important: > 80%

Operation alarm

GaussDB(DWS)

Yes

Parameters

Category

Name

Description

Location information

Name

Node Data Disk Usage Exceeds the Threshold

Type

Operation alarm

Generation time

Time when the alarm is generated

Other information

Cluster ID

Cluster details such as resourceId and domain_id

Impact on the System

If the cluster data volume or temporary data spill size increases and the usage of any single disk exceeds 90%, the cluster becomes read-only, affecting customer services.

Possible Causes

  • The service data volume increases rapidly, and the cluster disk capacity configuration cannot meet service requirements.
  • Dirty data is not cleared in a timely manner.
  • There are skew tables.

Handling Procedure

  1. Check the disk usage of each node.

    1. Log in to the GaussDB(DWS) console.
    2. On the Alarms page, select the current cluster from the cluster selection drop-down list in the upper right corner and view the alarm information of the cluster in the last seven days. Locate the name of the node for which the alarm is generated and the disk information based on the location information.
    3. Choose Dedicated Clusters > Clusters, locate the row that contains the cluster for which the alarm is generated, and click Monitoring Panel in the Operation column.
    4. Choose Monitoring > Node Monitoring > Disks to view the usage of each disk on the current cluster node. If you want to view the historical monitoring information about a disk on a node, click on the right to view the disk performance metrics in the last 1, 3, 12, or 24 hours.
      • If the data disk usage frequently increases and then returns to normal in a short period of time, it indicates that the disk usage temporarily spikes due to service execution. In this case, you can adjust the alarm threshold through 2 to reduce the number of reported alarms.
      • If the usage of a data disk exceeds 90%, read-only is triggered and error cannot execute INSERT in a read-only transaction is reported for write-related services. In this case, you can refer to 3 to delete unnecessary data.
      • If the usage of more than half of the data disks in the cluster exceeds 70%, the data volume in the cluster is large. In this case, refer to 4 to clear data or perform Disk Capacity Expansion.
      • If the difference between the highest and lowest data disk usage in the cluster exceeds 10%, refer to 5 to handle data skew.

  2. Check whether the alarm configuration is proper.

    1. Return to the GaussDB(DWS) console, choose Monitoring > Alarm and click View Alarm Rule.
    2. Locate the row that contains Node Data Disk Usage Exceeds the Threshold and click Modify in the Operation column. On the Modifying an Alarm Rule page, view the configuration parameters of the current alarm.
    3. Adjust the alarm threshold and detection period. A higher alarm threshold and a longer detection period indicate a lower alarm sensitivity. For details about the GUI configuration, see Alarm Rules.
    4. If the data disk specification is high, you are advised to increase the threshold based on historical disk monitoring metrics. Otherwise, perform other steps. If the problem persists, you are advised to perform Disk Capacity Expansion.

  3. Check whether the cluster is in the read-only state.

    1. When a cluster is in read-only state, stop the write tasks to prevent data loss caused by disk space exhaustion.
    2. Return to the GaussDB(DWS) console and choose Dedicated Clusters > Clusters. In the row of the abnormal cluster whose cluster status is Read-only, click Cancel Read-only.
    3. In the displayed dialog box, confirm the information and click OK to cancel the read-only state for the cluster. For details, see Removing the Read-only Status.
    4. After the read-only mode is disabled, use the client to connect to the database and run the DROP/TRUNCATE command to delete unnecessary data.
      NOTE:

      You are advised to lower the disk usage to below 70%. Check whether there are other tables that need to be rectified by referring to 4 and 5.

  4. Check whether the usage of more than half of the data disks in the cluster exceeds 70%.

    1. Run the VACUUM FULL command to clear data. For details, see Solution to High Disk Usage and Cluster Read-Only. Connect to the database, run the following SQL statement to query tables whose dirty page rate exceeds 30%, and sort the tables by size in descending order:
      1
      2
      3
      4
      5
      SELECT schemaname AS schema, relname AS table_name, n_live_tup AS analyze_count, pg_size_pretty(pg_table_size(relid)) as table_size, dirty_page_rate 
      FROM PGXC_GET_STAT_ALL_TABLES 
      WHERE schemaName NOT IN ('pg_toast', 'pg_catalog', 'information_schema', 'cstore', 'pmk') 
      AND dirty_page_rate > 30 
      ORDER BY table_size DESC, dirty_page_rate DESC;
      
      The following is an example of the possible execution result of the SQL statement (the dirty page rate of a table is high):
      1
      2
      3
      4
       schema | table_name | analyze_count | table_size | dirty_page_rate 
      --------+------------+---------------+------------+-----------------
       public | test_table |          4333 | 656 KB     |           71.11
      (1 row)
      
    2. If any result is displayed in the command output, clear the tables with a high dirty page rate in serial mode.
      1
      VACUUM FULL ANALYZE schema.table_name
      
      NOTICE:

      The VACUUM FULL operation occupies extra defragmentation space, which is Table size x (1 – Dirty page rate). As a result, the disk usage temporarily increases and then decreases. Ensure that the remaining space of the cluster is sufficient and will not trigger read-only when the VACUUM FULL operation is performed. You are advised to start from small tables. In addition, the VACUUM FULL operation holds an exclusive lock, during which access to the operated table is blocked. You need to properly arrange the execution time to avoid affecting services.

    3. If no command output is displayed, no table with a high dirty page rate exists. You can expand the node or disk capacity of the cluster based on the following data warehouse types to prevent service interruption caused by read-only triggered by further disk usage increase.
      1. To scale out a storage-compute coupled data warehouse with cloud SSDs, see Disk Capacity Expansion of an EVS Cluster.
      2. To scale out a storage-compute coupled data warehouse with local SSDs or a standalone system, see Scaling Out a Cluster.

  5. Check whether the difference between the highest and lowest data disk usages in the cluster exceeds 10%.

    1. If the data disk usage differs greatly, connect to the database and run the following SQL statement to check there are skew tables in the cluster:
      1
      SELECT schemaname, tablename, pg_size_pretty(totalsize), skewratio FROM pgxc_get_table_skewness WHERE skewratio > 0.05 ORDER BY totalsize desc;
      
      The following is an example of the possible execution result of the SQL statement:
      1
      2
      3
      4
      5
      6
      7
       schemaname |      tablename      | pg_size_pretty | skewratio 
      ------------+---------------------+----------------+-----------
       scheduler  | workload_collection | 428 MB         |      .500
       public     | test_table          | 672 KB         |      .429
       public     | tbl_col             | 104 KB         |      .154
       scheduler  | scheduler_storage   | 32 KB          |      .250
      (4 rows)
      
    2. If the SQL statement output is displayed, select another distribution column for the table with severe skew based on the table size and skew rate. For 8.1.0 and later versions, use the ALTER TABLE syntax to adjust the distribution column. For other versions, see How Do I Adjust Distribution Columns?

Alarm Clearance

After the disk usage decreases, the alarm is automatically cleared.

เราใช้คุกกี้เพื่อปรับปรุงไซต์และประสบการณ์การใช้ของคุณ การเรียกดูเว็บไซต์ของเราต่อแสดงว่าคุณยอมรับนโยบายคุกกี้ของเรา เรียนรู้เพิ่มเติม

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback