Solutions to High Table Fragmentation
Scenario
High table fragmentation is a common issue encountered when you use TaurusDB instances. Table fragmentation occurs when table data and indexes are scattered across non-contiguous physical blocks or contain gaps of idle space. As a result, the storage of table data and indexes on the disk becomes suboptimal.
This issue is primarily caused by data manipulation operations (DML), such as INSERT, UPDATE, and DELETE. These operations lead to continuous modifications and movement of data rows, causing the segments within the table to become fragmented and non-sequential.
Impact and Risk
- Tablespace bloat
High table fragmentation leads to a large amount of unused space in databases, wasting storage space.
- Poor query optimization
Excessive table fragmentation can prevent the optimizer from using indexes effectively, which affects execution plan selection and results in degraded query performance.
- Reduced SQL execution efficiency
High table fragmentation causes the database to spend additional time on I/O scans and fragment processing during SQL execution. This slows down both query and update operations, leading to increased response times.
Troubleshooting
Method 1: Use DBA Assistant to monitor the storage usage in real time to ensure that your instance does not run out of storage.
- Log in to the TaurusDB console.
- Click
in the upper left corner and select a region and project. - On the Instances page, click the instance name to go to the Basic Information page.
- In the navigation pane, choose DBA Assistant > Real-Time Diagnosis.
- Click the Storage Analysis tab. On the displayed page, you can view the fragmentation space and metrics in the Top Databases and Tables area. Figure 1 Top 50 databases/tables
Method 2: Run commands to view fragmentation metrics.
- Run the following command to analyze a table and update its statistics:
ANALYZE TABLE table_name; - Run the following command to view the table details:
SELECT table_name, data_length, data_free FROM information_schema.tables WHERE table_schema = 'database_name' AND table_name = 'table_name';- table_name: the name of a table
- data_length: the size of data stored in the table (unit: byte)
- data_free: the size of the allocated but unused space in the table, representing the table's fragmentation space (unit: byte)
Typically, you can assess the severity of fragmentation by calculating the ratio of data_free to data_length.
Figure 2 Command output
Possible Causes
Cause 1: fragmentation caused by parallel execution during a DRS full migration
During a full migration, DRS uses row-level parallel migration to ensure migration performance and transmission stability. If the source data is highly compact, higher fragmentation may cause data bloat after data is migrated to a TaurusDB instance. As a result, the storage consumption of the instance is much higher than that on the source database.
Cause 2: excessive deletions leaving fragments in the tablespace
When data is deleted, TaurusDB does not immediately reclaim the storage space occupied by the deleted data. Instead, it only marks the space as available for future reuse. If the space remains unfilled by new data, the tablespace bloats, resulting in fragmentation.
Solution
Optimize table fragmentation in the following scenarios:
- The instance has been running for a long period of time.
DML operations, such as INSERT, UPDATE, and DELETE, may lead to table fragmentation.
- There are many data updates.
- Database performance deteriorates.
If you identify obvious performance deterioration when querying a given amount of data, you may need to check whether fragmentation occurs.
- The storage space is insufficient.
If the storage usage is too high, check table fragmentation and defragment tables to reclaim space.
For tables with high fragmentation, regularly analyze fragmentation of frequently accessed tables and defragment the tables. This helps reorganize tablespaces, reduce storage usage, and improve performance.
To optimize a table, run the following statement on it:
OPTIMIZE TABLE table_name;
The OPTIMIZE TABLE statement briefly locks the table. Its execution time varies based on the table size. The execution often requires significant time. To avoid impact on workloads, optimize the table during off-peak hours.
Comparison Between ANALYZE TABLE and OPTIMIZE TABLE
| Item | ANALYZE TABLE | OPTIMIZE TABLE |
|---|---|---|
| Description | This statement updates table statistics. It is usually used to address issues such as SQL queries choosing incorrect indexes due to outdated statistics. | This statement not only updates statistics but also releases fragmented space. It is usually used to address excessive fragmented space in a table. |
| Execution Duration | Within seconds | The duration depends on the table data volume, instance specifications, and workloads. You are advised to execute this statement during off-peak hours. |
| Table Locking Duration | Within seconds | Within seconds |
FAQs
Question: Why Does the Space Usage Not Decrease After Data Is Deleted?
Answer: TaurusDB optimizes tablespace for deleted data. By default, the space occupied by deleted data is not immediately released but is retained for new writes.
To release the space immediately, run OPTIMIZE TABLE tableName during off-peak hours.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot