Troubleshooting High Storage Space Usage

Description

Redundancy is required for the storage space of a production DB instance. If the storage space usage is too high, handle the problem in a timely manner to prevent the instance from being damaged due to full storage.

You need to pay attention to the following key metrics:

Storage usage: rds039_disk_util
Total storage: rds047_disk_total_size
Used storage: rds048_disk_used_size
Transaction logs (WAL logs) usage: rds040_transaction_logs_usage
Oldest replication slot lag (WAL logs accumulated due to replication slot problems): rds045_oldest_replication_slot_lag

Cause Analysis

In an RDS for PostgreSQL instance, data files (such as tables and indexes), WAL logs, and temporary files may occupy the most storage space. If the storage usage increases unexpectedly, analyze the causes as follows:

Figure 1 Cause analysis

Troubleshooting and Solution

SQL statements for querying the storage usage of databases, tables, or WAL logs occupy a large amount of disk I/O. Therefore, run such SQL statements during off-peak hours.

Check whether the WAL log size is within its allowed range. If no, rectify the fault.
- Check the WAL log size.
  View the rds040_transaction_logs_usage metric or run the following SQL statement to check the WAL log size. If there are many WAL logs, perform the following steps to locate the fault.
```
select round(sum(size)/1024/1024/1024,2) "GB" from pg_ls_waldir();
```
  The pg_ls_waldir() function is available only in RDS for PostgreSQL 12 and later versions.
  
  User root is required to execute the pg_ls_waldir function.
- Check the WAL log retention parameter.
  - For RDS for PostgreSQL 12 or earlier versions, check the value of wal_keep_segments (unit: MB). For later versions, check the value of wal_keep_size (unit: MB).
  - The value of the WAL log retention parameter should be greater than 4 GB but less than 10% of the total storage. Otherwise, the primary instance may clear the WAL logs required by the standby instance, causing exceptions on the standby instance.
- Check the replication slot status and the size of logs that are not cleared.
  Replication slots block WAL reclamation. If inactive or unnecessary replication slots are found, delete them as required.
  
  Run the following SQL statement to query the status of a replication slot and uncleared WAL logs:
```
select slot_name, active,
pg_size_pretty(pg_wal_lsn_diff(b, a.restart_lsn)) as slot_latency
from pg_replication_slots as a, pg_current_wal_lsn() as b;
```
  Run the following SQL statement to delete a slot:
```
select pg_drop_replication_slot('slot_name');
```
- Check how busy write services are.
  View the rds044_transaction_logs_generations metric to determine how busy write services are. This metric indicates the average size of transaction logs (WAL logs) generated per second.
  
  If the value of this metric is large, there are a large number of write services. In this case, the database kernel reserves more WAL logs for reclamation, and the storage usage of WAL logs increases. You are advised to scale up storage to ensure storage redundancy.

Check whether the size of data files is normal. If no, rectify the fault.

Query the top 10 databases with the highest storage usage.

select datname, pg_database_size(oid)/1024/1024 as dbsize_mb from pg_database order by dbsize_mb desc limit 10;

View the top 10 objects (tables/indexes) with the highest storage usage.

You can use the relpages field of pg_class to estimate the size of a table or index. The SQL statement is as follows:

select relname, relpages*8/1024 as tablesize_mb from pg_class order by tablesize_mb desc limit 10;

To obtain the exact size of a table or index, use any of the following functions:

**Table 1** Function description
Name	Return Type	Description
pg_relation_size(relation regclass, fork text)	bigint	Storage space used by a specified fork ('main', 'fsm', 'vm', or 'init') of a specified table or index
pg_relation_size(relation regclass)	bigint	Shorthand for pg_relation_size(..., 'main')
pg_table_size(regclass)	bigint	Storage space used by a specified table, excluding indexes (but including TOAST, free space map, and visibility map)
pg_total_relation_size(regclass)	bigint	Total storage space used by the specified table, including all indexes and TOAST data

Check whether there is any table bloat.
Once the table that occupies a large amount of storage space is determined, you can use the pgstattuple extension to analyze whether the table is bloated. The extension can be installed by running the following statements:
```
create control_extension('create', 'pgstattuple');
select * from pgstattuple('table_name');
```
Some kernel versions do not support the pgstattuple extension. For details, see Supported Extensions.

For details about how to use this extension, see https://www.postgresql.org/docs/15/pgstattuple.html.
Clear table data.
- If any table bloat is found, you can vacuum the table in the maintenance time window.
  
  VACUUM FULL locks the table. Ensure that no DML operation is being performed during the operation.
```
vacuum full table_name;
```
- If any unnecessary table or data is found, you can use the truncate table or drop table statement to delete unnecessary data.
```
truncate table table_name;
```
- If you cannot delete all data in a table by running DROP or TRUNCATE but want to release storage space by deleting some of table data, note that the DELETE operation does not release storage space (deleted data is marked as invisible) due to the Multi-Version Concurrency Control (MVCC) mechanism of RDS for PostgreSQL. In this case, use VACUUM FULL to release the storage space.
- If only a small amount of data needs to be retained, you can create a new table and transfer the data to the table. The procedure is as follows:
  1. Store information such as the structure and indexes of the original table.
  2. Create a new table.
  3. Insert data into the new table.
  4. Check whether the data in the new table meets the expectation. If yes, go to the next step. If no, check whether the previous operations are successful.
  5. Delete the original table.
  6. Rename the new table and create indexes.
VACUUM FULL rebuilds the table and its indexes. During this period, WAL logs are generated. Sufficient storage space needs to be reserved. (Assume that the size of the rebuilt table is 1 GB and the size of indexes is 0.5 GB. You are advised to reserve at least 2.5 GB of storage space.)

For details about vacuum, see https://www.postgresql.org/docs/current/routine-vacuuming.html.

Check whether the size of temporary files is normal. If no, rectify the fault.
If high storage usage is not caused by data files or WAL logs, temporary files may occupy a large amount of storage space. Run the following SQL statement to check the size of temporary files:
```
select round(sum(size)/1024/1024/1024,2) "GB" from pg_ls_tmpdir();
```
- The pg_ls_waldir() function is available only in RDS for PostgreSQL 12 and later versions.
- User root is required to execute the pg_ls_waldir function.
- When there are a large number of temporary files, the SQL statement execution is slow.
Generally, temporary files are released after complex SQL statements are executed. However, if an OOM exception occurs, temporary files may fail to be released. To reduce the generated temporary files, analyze and optimize slow SQL statements. Or you can reboot the instance in the maintenance time window to delete all temporary files.

Parent Topic: Performance Tuning

Previous topic: Troubleshooting Database Age Increase Problem

Next topic: Troubleshooting Abnormal Connections and Active Connections

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.

Chatbot