Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Troubleshooting High Storage Space Usage

Updated on 2025-01-02 GMT+08:00

Description

Redundancy is required for the storage space of a production DB instance. If the storage space usage is too high, handle the problem in a timely manner to prevent the instance from being damaged due to full storage.

You need to pay attention to the following key metrics:

  • Storage usage: rds039_disk_util
  • Total storage: rds047_disk_total_size
  • Used storage: rds048_disk_used_size
  • Transaction logs (WAL logs) usage: rds040_transaction_logs_usage
  • Oldest replication slot lag (WAL logs accumulated due to replication slot problems): rds045_oldest_replication_slot_lag

Cause Analysis

In an RDS for PostgreSQL instance, data files (such as tables and indexes), WAL logs, and temporary files may occupy the most storage space. If the storage usage increases unexpectedly, analyze the causes as follows:

Figure 1 Cause analysis

Troubleshooting and Solution

NOTICE:

SQL statements for querying the storage usage of databases, tables, or WAL logs occupy a large amount of disk I/O. Therefore, run such SQL statements during off-peak hours.

  • Check whether the WAL log size is within its allowed range. If no, rectify the fault.
    • Check the WAL log size.

      View the rds040_transaction_logs_usage metric or run the following SQL statement to check the WAL log size. If there are many WAL logs, perform the following steps to locate the fault.

      select round(sum(size)/1024/1024/1024,2) "GB" from pg_ls_waldir();
      NOTICE:

      The pg_ls_waldir() function is available only in RDS for PostgreSQL 12 and later versions.

      User root is required to execute the pg_ls_waldir function.

    • Check the WAL log retention parameter.
      • For RDS for PostgreSQL 12 or earlier versions, check the value of wal_keep_segments (unit: MB). For later versions, check the value of wal_keep_size (unit: MB).
      • The value of the WAL log retention parameter should be greater than 4 GB but less than 10% of the total storage. Otherwise, the primary instance may clear the WAL logs required by the standby instance, causing exceptions on the standby instance.
    • Check the replication slot status and the size of logs that are not cleared.

      Replication slots block WAL reclamation. If inactive or unnecessary replication slots are found, delete them as required.

      Run the following SQL statement to query the status of a replication slot and uncleared WAL logs:

      select slot_name, active,
      pg_size_pretty(pg_wal_lsn_diff(b, a.restart_lsn)) as slot_latency
      from pg_replication_slots as a, pg_current_wal_lsn() as b;

      Run the following SQL statement to delete a slot:

      select pg_drop_replication_slot('slot_name');
    • Check how busy write services are.

      View the rds044_transaction_logs_generations metric to determine how busy write services are. This metric indicates the average size of transaction logs (WAL logs) generated per second.

      If the value of this metric is large, there are a large number of write services. In this case, the database kernel reserves more WAL logs for reclamation, and the storage usage of WAL logs increases. You are advised to scale up storage to ensure storage redundancy.

  • Check whether the size of data files is normal. If no, rectify the fault.
    • Query the top 10 databases with the highest storage usage.
      select datname, pg_database_size(oid)/1024/1024 as dbsize_mb from pg_database order by dbsize_mb desc limit 10;
    • View the top 10 objects (tables/indexes) with the highest storage usage.

      You can use the relpages field of pg_class to estimate the size of a table or index. The SQL statement is as follows:

      select relname, relpages*8/1024 as tablesize_mb from pg_class order by tablesize_mb desc limit 10;

      To obtain the exact size of a table or index, use any of the following functions:

      Table 1 Function description

      Name

      Return Type

      Description

      pg_relation_size(relation regclass, fork text)

      bigint

      Storage space used by a specified fork ('main', 'fsm', 'vm', or 'init') of a specified table or index

      pg_relation_size(relation regclass)

      bigint

      Shorthand for pg_relation_size(..., 'main')

      pg_table_size(regclass)

      bigint

      Storage space used by a specified table, excluding indexes (but including TOAST, free space map, and visibility map)

      pg_total_relation_size(regclass)

      bigint

      Total storage space used by the specified table, including all indexes and TOAST data

    • Check whether there is any table bloat.

      Once the table that occupies a large amount of storage space is determined, you can use the pgstattuple extension to analyze whether the table is bloated. The extension can be installed by running the following statements:

      create control_extension('create', 'pgstattuple');
      select * from pgstattuple('table_name');
      NOTICE:

      Some kernel versions do not support the pgstattuple extension. For details, see Supported Extensions.

      For details about how to use this extension, see https://www.postgresql.org/docs/15/pgstattuple.html.

    • Clear table data.
      • If any table bloat is found, you can vacuum the table in the maintenance time window.
        NOTICE:

        VACUUM FULL locks the table. Ensure that no DML operation is being performed during the operation.

        vacuum full table_name;
      • If any unnecessary table or data is found, you can use the truncate table or drop table statement to delete unnecessary data.
        truncate table table_name;
      • The DELETE operation does not release storage space. Instead, there will be a large number of WAL logs generated, which increases the storage space consumption. Do not use DELETE to release storage space when the storage is filling up.

        Due to the Multi-Version Concurrency Control (MVCC) mechanism of PostgreSQL, the DELETE operation does not release storage space (deleted data is marked as invisible). The storage space can be released only after VACUUM FULL is executed. The VACUUM FULL operation consumes storage space and locks the table. Therefore, perform this operation during off-peak hours and reserve at least twice the size of the table.

      • If only a small amount of data needs to be retained, you can create a new table and transfer the data to the table. The procedure is as follows:
        1. Store information such as the structure and indexes of the original table.
        2. Create a new table.
        3. Insert data into the new table.
        4. Check whether the data in the new table meets the expectation. If yes, go to the next step. If no, check whether the previous operations are successful.
        5. Delete the original table.
        6. Rename the new table and create indexes.
      NOTE:

      VACUUM FULL rebuilds the table and its indexes. During this period, WAL logs are generated. Sufficient storage space needs to be reserved. (Assume that the size of the rebuilt table is 1 GB and the size of indexes is 0.5 GB. You are advised to reserve at least 2.5 GB of storage space.)

      For details about vacuum, see https://www.postgresql.org/docs/current/routine-vacuuming.html.

  • If the storage usage exceeds 97%, the instance becomes read-only and data cannot be cleared using drop or truncate. To solve this problem, use either of the following methods:
    • Scale up storage space. If the storage capacity has reached the upper limit of your DB instance class, upgrade the instance class first.

      After the storage usage drops below 87%, the instance becomes readable and writable. Then, delete unnecessary data. You can enable storage autoscaling for instances using cloud disks. When the storage usage reaches the threshold, autoscaling is triggered.

    • If you do not want to scale up the storage, contact customer service to remove the read-only status and then delete unnecessary data. Before removing the read-only status, stop your workloads. If data continues to be written to the disk after the read-only status is removed, the storage will be getting full again.
  • Check whether the size of temporary files is normal. If no, rectify the fault.

    If high storage usage is not caused by data files or WAL logs, temporary files may occupy a large amount of storage space. Run the following SQL statement to check the size of temporary files:

    select round(sum(size)/1024/1024/1024,2) "GB" from pg_ls_tmpdir();
    NOTICE:
    • The pg_ls_waldir() function is available only in RDS for PostgreSQL 12 and later versions.
    • User root is required to execute the pg_ls_waldir function.
    • When there are a large number of temporary files, the SQL statement execution is slow.

    Generally, temporary files are released after complex SQL statements are executed. However, if an OOM exception occurs, temporary files may fail to be released. To reduce the generated temporary files, analyze and optimize slow SQL statements. Or you can reboot the instance in the maintenance time window to delete all temporary files.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback