Esta página ainda não está disponível no idioma selecionado. Estamos trabalhando para adicionar mais opções de idiomas. Agradecemos sua compreensão.

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
Help Center/ GaussDB(DWS)/ Troubleshooting/ Cluster Performance/ Solution to High Disk Usage and Cluster Read-Only

Solution to High Disk Usage and Cluster Read-Only

Updated on 2025-01-06 GMT+08:00

Checking the Disk Usage

GaussDB(DWS) disk space is a high-value resource for users. It is closely related to cluster availability. Therefore, you need to pay closely monitor the available disk space and respond promptly when necessary. Note that in this context, "disks" specifically refers to data disks.

To check the disk space, perform the following steps:

  1. Log in to the GaussDB(DWS) console. Click Clusters on the left navigation pane. In the cluster list, click Monitoring on the right of the row that contains the desired cluster. The Monitoring page is displayed.
  2. Choose Monitoring > Node Monitoring. On the displayed page, choose the Disks tab. You can click on the right to sort the records based on disk usage.

    To identify a data disk, check the disk capacity. If the disk capacity equals to the purchased capacity, the disk is a data disk.

Fault Scenarios

  • Scenario 1: High Disk Usage: The usage of all disks or more than half of the disks in the current cluster is greater than or equal to 70%.
  • Scenario 2: Disk Skew: The difference between the highest usage and the lowest usage is greater than or equal to 10%.
  • Scenario 3: Cluster Read-only. (The current read-only threshold is that the usage of a single data disk is greater than or equal to 90%.)

During routine processing, the database administrator can use the Space Management and Control to identify and block abnormal workloads to avoid the preceding scenarios.

NOTICE:
  1. Read-only is a self-protection mechanism of the GaussDB(DWS) system. It prevents GaussDB(DWS) instance startup failures caused by 100% disk usage.
  2. In the preceding scenarios, the DMS sends alarm notifications. (The alarm threshold is 80%, which can be configured. For details, see Alarm Rules.)
  3. In scenario 1, you can use the alarm subscription function to receive SMS messages or emails when the disk usage exceeds 70% or 75%. This allows you to clear data in advance. For details, see Subscribing to Disk Space Alarms.

Scenario 1: Clearing Data When the Disk Usage Is High

Periodically delete dirty data based on the query results. The method varies according to the cluster version.

  • 8.1.3 and later versions: Use the Intelligent O&M function on the management console to automatically clear dirty data.
  1. Log in to the GaussDB(DWS) management console.
  2. Click the name of the target cluster.
  3. In the navigation pane, choose Intelligent O&M.
  4. Click the O&M Plan tab. Click Add O&M Task.

  5. The Add O&M Task page is displayed.

    • Select Vacuum for O&M Task.
    • Set Scheduling Mode to Auto. GaussDB(DWS) automatically scans tables that require VACUUM operation.
    • Select System catalogs or User tables for Autovacuum.
      • If there are a large number of UPDATE and DELETE operations, select the User tables.
      • If there are a large number of CREATE and DELETE operations, select System catalogs.

  6. Click Next: Configure Schedule to configure the schedule and Vacuum type. You are advised to select Periodic for Task Type. The GaussDB(DWS) automatically executes VACUUM in your selected time windows.

    NOTE:

    For automatic Vacuum O&M tasks, the system uses the VACUUM FULL operation to process user tables. VACUUM FULL holds a level 8 lock, which blocks other transactions. Other transactions will be in lock waiting during VACUUM FULL execution. After 20 minutes, a timeout error is reported. Therefore, do not perform other transactions in the configured time window.

  7. After confirming that the information is correct, click Next to complete the configuration.

  • 8.1.2 and earlier versions: Run the VACUUM FULL command to clear data.
    NOTICE:
    1. VACUUM FULL locks a table. During VACUUM FULL, all accesses to the table are blocked and wait until VACUUM FULL ends. Please schedule properly to prevent table locking from affecting services.
    2. VACUUM FULL is used to extract valid data from the current table and delete dirty data. This operation temporarily occupies extra space (the space will be released after the data is deleted). Therefore, the space increases and then decreases during VACUUM FULL, calculate the space required by VACUUM FULL in advance. (Extra space = Table size x (1 – Dirty page rate))
  1. Connect to the database, run the following SQL statement to query large tables whose dirty page rate exceeds 30%, and sort the tables by size in descending order:

    1
    2
    3
    4
    5
    SELECT schemaname AS schema, relname AS table_name, n_live_tup AS analyze_count, pg_size_pretty(pg_table_size(relid)) as table_size, dirty_page_rate
    FROM PGXC_GET_STAT_ALL_TABLES
    WHERE schemaName NOT IN ('pg_toast', 'pg_catalog', 'information_schema', 'cstore', 'pmk')
    AND dirty_page_rate > 30
    ORDER BY table_size DESC, dirty_page_rate DESC;
    

  2. Check whether any command output is displayed.

    • If yes, run the commands in 3 for a table larger than 10 GB.
    • If no, no further action is required.

  3. Run the VACUUM FULL command to clear the top 5 tables with the most dirty pages. If the maximum disk usage is greater than 70%, clear the tables one by one.

    1
    VACUUM FULL ANALYZE schema.table_name;
    

  4. If there is no table with a high dirty page rate and the disk usage is close to or exceeds 75%, expand the node or disk capacity of the cluster based on the following data warehouse types to prevent cluster read-only.

Scenario 2: Disk Skew and Clearing Skew Tables

If the skew rate of a table on a single DN is greater than or equal to 5%, you are advised to reselect a distribution key for the table and redistribute data.

NOTICE:
  • Impacts of skewed tables: Skew tables may cause severe skew in operator computing or spilling. As a result, some DNs may be overloaded, and the advantages of GaussDB(DWS) distributed computing cannot be fully utilized, affecting system performance. Besides, they are easily to cause disk space exhaustion on a single DN.
  • In 8.1.3 and later versions, tables are created in polling (RoundRobin) mode by default (see RoundRobin). If you are not familiar with distribution keys, you can use the ROUNDROBIN keyword to create tables to simplify the service development (see Application Scenarios of Polling Tables and Hash Tables).
  1. Connect to the database and run the following SQL statement to query the skew tables:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    WITH skew AS
    (
            SELECT
                    schemaname,
                    tablename,
                    pg_catalog.sum(dnsize) AS totalsize,
                    pg_catalog.avg(dnsize) AS avgsize,
                    pg_catalog.max(dnsize) AS maxsize,
                    pg_catalog.min(dnsize) AS minsize,
                    (pg_catalog.max(dnsize) - pg_catalog.min(dnsize)) AS skewsize,
                    pg_catalog.stddev(dnsize) AS skewstddev
            FROM pg_catalog.pg_class c
            INNER JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
            INNER JOIN pg_catalog.gs_table_distribution() s ON s.schemaname = n.nspname AND s.tablename = c.relname
            INNER JOIN pg_catalog.pgxc_class x ON c.oid = x.pcrelid AND x.pclocatortype IN('H', 'N')
            GROUP BY schemaname,tablename
    )
    SELECT
            schemaname,
            tablename,
            totalsize,
            avgsize::numeric(1000),
            (maxsize/totalsize)::numeric(4,3)  AS maxratio,
            (minsize/totalsize)::numeric(4,3)  AS minratio,
            skewsize,
            (skewsize/avgsize)::numeric(4,3)  AS skewratio,
            skewstddev::numeric(1000)
    FROM skew
    WHERE totalsize > 0;
    

    For details about how to query skew tables, see section Data Skew Causes Slow SQL Statement Execution and Operations Fail on Large Tables.

  2. Select another distribution column for the table with severe skew based on the table size and skew rate. In 8.1.0 and later versions, use the ALTER TABLE syntax to change the distribution column. For other versions, see How Do I Change Distribution Columns.

Scenario 3: Cluster Read-only.

When the usage of a single disk in the cluster exceeds 90%, the system automatically triggers read-only for the cluster. In this case, if you continue to execute write services (DML or DDL), an error message similar to "ERROR: cannot execute xxx in a read-only transaction." is displayed.

Read-only is a mechanism of GaussDB(DWS) to protect user data. It prevents instance startup failures caused by 100% disk usage.

After the cluster is read-only, you need to perform operations such as DROP and TRUNCATE to delete unnecessary data as soon as possible to reduce the space usage to less than 80%. Then handle tables mentioned in scenarios 1 and 2 to prevent the disk usage increase caused by VACUUM FULL.
  • If the cluster version is 8.1.3 or later, perform the following steps:
  1. When a cluster is in read-only status, stop the write tasks to prevent data loss caused by disk space exhaustion.
  2. Use the client to connect to the database, disable read-only through START TRANSACTION, and run the DROP/TRUNCATE TABLE command to clear unnecessary data to ensure that the disk usage is less than 80%.

    Data clearing method 1:
    1
    2
    3
    START TRANSACTION READ WRITE;
    drop/truncate table table_name;
    COMMIT;
    
    Data clearing method 2:
    1
    2
    3
    4
    START TRANSACTION;
    SET transaction_read_only=off;
    drop/truncate table table_name;
    COMMIT;
    

    After the clearing is complete, the system automatically cancels the read-only mode.

  3. Check the tables mentioned in scenarios 1 and 2 to see whether there are tables that need to be handled. If no, you are advised to scale out the cluster as soon as possible. Based on the data warehouse type, scale-out is classified into node scale-out and disk scale-out.

  • For cluster 8.1.2 and earlier versions, perform the following steps:
  1. When a cluster is in read-only status, stop the write tasks to prevent data loss caused by disk space exhaustion.
  2. Log in to the GaussDB(DWS) console and cancel the read-only state of the cluster.

    1. Log in to the GaussDB(DWS) console. Click Clusters. All clusters will be displayed by default.
    2. In the Operation column of the target cluster, choose More > Cancel Readonly.

    3. In the dialog box that is displayed, click OK to confirm and cancel the read-only status for the cluster.

  3. After the read-only state is canceled, run the DROP/TRUNCATE command to delete unnecessary data. Ensure that the disk usage is less than 80%.
  4. Check the tables mentioned in scenarios 1 and 2 to see whether there are tables that need to be handled. If no, you are advised to scale out the cluster as soon as possible. Based on the data warehouse type, scale-out is classified into node scale-out and disk scale-out.

Space Management and Control

GaussDB(DWS) supports the sql_use_spacelimit and temp_file_limit parameters for statement disk space control. These parameters prevent disk usage spikes that can cause alarms or read-only mode during service operation. They also help detect services that exchange or import too much data to the database.

  1. Log in to the GaussDB(DWS) console, click Clusters on the left, and click the desired cluster. The cluster details page is displayed.
  2. Click Parameters, search for sql_use_spacelimit and temp_file_limit (see Disk Space) in the search box, and adjust them.

    You are advised to set sql_use_spacelimit to 10% of the total capacity. For example, if the purchased space is 100 GB per node, set sql_use_spacelimit to 10 GB.
    NOTICE:
    After the preceding configuration takes effect, if the space usage of a SQL statement exceeds the value of this parameter, the SQL statements will be terminated. To temporarily disable the function, run the following statement in the session:
    1
    SET sql_use_spacelimit=0;
    

Subscribing to Disk Space Alarms

To reduce customers' O&M pressure, GaussDB(DWS) provides the alarm subscription function. When the disk usage of a cluster is greater than the preset threshold, the system notifies you through SMS messages or emails.

  1. Set an alarm threshold:

    1. Log in to the GaussDB(DWS) console, choose Alarms on the left, and click Alarm Rule Management.

    2. In the alarm rule list, click Modify on the right of Node Data Disk Usage Exceeds the Threshold.
    3. In the cluster list, click the name of the target cluster. The Cluster Information page is displayed. Set the alarm policy as follows:
      • Associated Cluster: all clusters
      • Alarm Policy: Set the threshold of important alarms to 70% and the duration to Last 10 minutes, and set the threshold of urgent alarms to 75% and the duration to Last 10 minutes. Set the parameters as follows:

  2. Create a topic on the Simple Message Notification (SMN) console.

    1. Switch to the SMN console and click Create Topic. Set Topic Name and Enterprise Project as follows.

    2. After the topic is created, click Add Subscription on the right. Select SMS or Email, and enter the mobile number or email address in the Endpoint text box.

    3. The entered mobile number or email address will receive a confirmation SMS message or email. Click OK to complete the subscription.

  3. Add an alarm subscription.

    1. Return to the GaussDB(DWS) console, choose Alarms on the left, click Subscriptions, and click Create Subscription.
    2. Set Subscription Name to dms_alarm, Alarm Severity to Urgent and Important, and SMN Topic to dms_alarm created in the previous step.

    3. Click OK. After the alarm subscription is configured, a notification will be sent when the disk usage exceeds 70% or 75%.

Usamos cookies para aprimorar nosso site e sua experiência. Ao continuar a navegar em nosso site, você aceita nossa política de cookies. Saiba mais

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback