What's New
Function Overview
Product Bulletin
- Product Notice
- Version Support Notes
Service Overview
- GaussDB(DWS) Infographics
- What Is GaussDB(DWS)?
- Data Warehouse Types
- Data Warehouse Flavors
- Advantages
- Application Scenarios
- Functions
- Concepts
- Related Services
- Security
- GaussDB(DWS) Permissions Management
- GaussDB(DWS) Access
- Restrictions
- Technical Support
- Service Quotas
- Technical Specifications
Billing
- GaussDB(DWS) Billing Overview
- Billing Modes
- Item
- Billing Examples
- Billing Mode Change
- Renewal
  - Overview
  - Manual Renewal
- Bills
- Arrears
- Stopping Billing
- Cost Management
- Billing FAQs
Getting Started
- Checkpoint Vehicle Analysis
- Supply Chain Requirement Analysis of a Company
- Operations Status Analysis of a Retail Department Store
- Creating a Time Series Table
- Best Practices of Hot and Cold Data Management
- Best Practices for Automatic Partition Management
- Creating a Cluster and Connecting to It
- Using CDM to Migrate MySQL Data to the GaussDB(DWS) Cluster
- Using DLI Flink Jobs to Write Kafka Data to GaussDB(DWS) in Real Time
- Basic SQL Operations
- Database Quick Start
- Getting Started with Common Practices
User Guide
- Using GaussDB(DWS)
- Preparations
- Creating a GaussDB(DWS) Cluster
  - Creating a Dedicated GaussDB(DWS) Cluster
    - Creating a GaussDB(DWS) 2.0 Cluster with Coupled Storage and Compute
    - Creating a GaussDB(DWS) 3.0 Cluster with Decoupled Storage and Compute
- Connecting to a GaussDB(DWS) Cluster
- Creating a GaussDB(DWS) Database and User
- Migrating Service Data to a GaussDB(DWS) Cluster
  - Migrating Data to a GaussDB(DWS) Cluster Using GDS-Kafka
  - Data Source Management
- GaussDB(DWS) Cluster Data Security and Encryption
- GaussDB(DWS) Cluster Management
- GaussDB(DWS) Cluster O&M
Best Practices
- Import and Export
- Data Migration
- Data Analytics
- Decoupled Storage and Compute
  - DWS 3.0 Decoupled Storage and Compute Usage Suggestions and Performance Optimization
- Data Development
- Database Management
- Performance Tuning
  - Optimizing Table Structure Design to Enhance GaussDB(DWS) Query Performance
  - Analyzing SQL Statements That Are Being Executed to Handle GaussDB(DWS) Performance Issues
- Cluster Management
  - Binding Different Resource Pools to Two Types of Jobs to Balance Load for GaussDB(DWS)
  - Scaling Options for GaussDB(DWS) with a Coupled Storage-Compute Architecture
Data Migration and Synchronization
- Data Migration to GaussDB(DWS)
- Importing Data
- Full Database Migration
  - Using CDM to Migrate Data to GaussDB(DWS)
  - Using DSC to Migrate SQL Scripts
- Real-time Import
  - Using DRS to Import Data to GaussDB(DWS)
  - Real-time Data Import From Kafka GaussDB(DWS)
- Metadata Migration
  - Using gs_dump and gs_dumpall to Export Metadata
  - Using gs_restore to Import Data
- Exporting Data
- Other Operations
Developer Guide
- Standard data warehouse (9.1.0.x)
- Standard data warehouse (8.3.0.x)
- Standard data warehouse (8.1.3.x)
- Hybrid Data Warehouse (8.1.3.x）
- Historical Versions
SQL Syntax Reference
- SQL Syntax Reference (9.1.0.x)
- SQL Syntax Reference (8.3.0.x)
- SQL Syntax Reference (8.1.3.x)
- Historical Versions
Performance Tuning
- Overview of Query Performance Optimization
- Query Execution Process
- SQL Execution Plan
- SQL Optimization Guide
- Optimization Cases
- SQL Execution Troubleshooting
- query_band Load Identification
- Common Performance Parameter Optimization Design
Tool Guide
- Overview
- Downloading Related Tools
- gsql
- Data Studio
- GDS
- DSC
- DWS-Connector
- Server Tool
  - gs_dump
  - gs_dumpall
  - gs_restore
  - gds_check
  - gds_install
  - gds_uninstall
  - gds_ctl
API Reference
- Before You Start
- API Overview
- Calling APIs
- Getting Started
- API Description
- Application Cases
  - Using Postman to Call the API for Creating a Cluster
  - Using Postman to Call the API for Creating a Snapshot
- Introduction
- Appendix
SDK Reference
- SDK Overview
FAQs
- Top FAQs
- Product Consulting
- Database Connections
- Data Migration
- Database Usage
- Cluster Management
- Account Permissions
- Database Performance
- Backup and Restoration
  1. Why Does It Take a Long Time to Create an Automated Snapshot in GaussDB(DWS)?
  2. Does a GaussDB(DWS) Snapshot Have the Same Function as an EVS Snapshot?
Troubleshooting
- Database Connections
- JDBC/ODBC
- Data Import and Export
- Database Parameter Modification
- Account/Permission/Password
- Cluster Performance
- Cluster Exceptions
  - The Disk Usage Alarm Is Frequently Generated
- Database Use
Videos
Performance White Paper
- Overview
- Test Result
  - TPC-H Single Query Test
  - TPC-DS Single Query Test
- Test Methods
- Appendixes
  - TPC-H Test Sets
  - TPC-DS Test Sets
Technical White Paper
- GaussDB(DWS)
- Platforms and Technical Specifications Supported by GaussDB(DWS)
  - Technical Specifications
- GaussDB(DWS) Core Technologies
- GaussDB(DWS) Tools
  - Client Tools
  - Database Monitoring Tool
- External APIs
Error Code Reference
- 8.2.0 and earlier versions
  - Management Console Error Code
  - Data Warehouse Service Error Codes
- 8.2.1 or later versions
Glossary
More Documents
- User Guide
- API Reference (ME-Abu Dhabi Region)
- Developer Guide (ME-Abu Dhabi Region)
- SQL Syntax Reference (ME-Abu Dhabi Region)
- Tool Guide (ME-Abu Dhabi Region)
- Error Code Reference (ME-Abu Dhabi Region)
  - Management Console Error Code
- User Guide (Paris Region)
- API Reference (Paris Region)
- Developer Guide (Paris Region)
- SQL Syntax Reference (Paris Region)
- Tool Guide (Paris Region)
- Error Code Reference (Paris Region)
  - Management Console Error Code
- User Guide (Kuala Lumpur Region)
- API Reference (Kuala Lumpur Region)
- Developer Guide (Kuala Lumpur Region)
- SQL Syntax Reference (Kuala Lumpur Region)
- Tool Guide (Kuala Lumpur Region)
- Error Code Reference (Kuala Lumpur Region)
  - Management Console Error Code
General Reference
- Glossary
- Service Level Agreement
- White Papers
- Endpoints
- Permissions

On this page

Show all

Help Center/ GaussDB(DWS)/ Best Practices/ Data Development/ Cutting Costs by Switching Between Cold and Hot Data Storage in GaussDB(DWS)

Cutting Costs by Switching Between Cold and Hot Data Storage in GaussDB(DWS)

Updated on 2024-11-08 GMT+08:00

View PDF

Scenarios

In massive big data scenarios, with the growing of data, data storage and consumption increase rapidly. The need for data may vary in different time periods, therefore, data is managed in a hierarchical manner, improving data analysis performance and reducing service costs. In some data usage scenarios, data can be classified into hot data and cold data by accessing frequency.

Hot and cold data is classified based on the data access frequency and update frequency.

Hot data: Data that is frequently accessed and updated and requires fast response.
Cold data: Data that cannot be updated or is seldom accessed and does not require fast response

You can define cold and hot management tables to switch cold data that meets the specified rules to OBS for storage. Cold and hot data can be automatically determined and migrated by partition.

Figure 1 Hot and cold data management

When data is inserted to GaussDB(DWS) column-store tables, the data is first stored in hot partitions. As data accumulates, you can manually or automatically migrate the cold data to OBS for storage. The metadata, description tables, and indexes of the migrated cold data are stored locally to ensure the read performance.

The hot and cold partitions can be switched based on LMT (Last Modify Time) and HPN (Hot Partition Number) policies. LMT indicates that the switchover is performed based on the last update time of the partition, and HPN indicates that the switchover is performed based on the number of reserved hot partitions.

LMT: Switch the hot partition data that is not updated in the last [day] days to the OBS tablespace as cold partition data. [day] is an integer ranging from 0 to 36500, in days.
In the following figure, day is set to 2, indicating that the partitions modified in the last two days are retained as the hot partitions, while the rest is retained as the cold partitions. Assume that the current time is April 30. The delete operation is performed on the partition [4-26] on April 30, and the insert operation is performed on the partition [4-27] on April 29. Therefore, partitions [4-26][4-27][4-29][4-30] are retained as hot partitions.
HPN: indicates the number of hot partitions to be reserved. The partitions are sequenced based on partition sequence IDs. The sequence ID of a partition is a built-in sequence number generated based on the partition boundary values and is not shown. For a range partition, a larger boundary value indicates a larger sequence ID. For a list partition, a larger maximum enumerated value of the partition boundary indicates a larger sequence ID. During the cold and hot switchover, data needs to be migrated to OBS. HPN is an integer ranging from 0 to 1600. If HPN is set to 0, hot partitions are not reserved. During a cold/hot switchover, all partitions with data are converted to cold partitions and stored on OBS.
In the following figure, HPN is set to 3, indicating that the last three partitions with data are retained as the hot partitions with the rest as the cold partitions during hot and cold partition switchover.

Constraints

Supports DML operations on cold and hot tables, such as INSERT, COPY, DELETE, UPDATE, and SELECT.
Supports DCL operations such as permission management on cold and hot tables.
Supports ANALYZE, VACUUM, MERGE INTO, and PARTITION operations on cold and hot tables.
Supports common column-store partitioned tables to be upgraded to hot and cold data tables.
Supports upgrade, scale-out, scale-in, and redistribution operations on tables with cold and hot data management enabled.
8.3.0 and later versions support mutual conversion between cold and hot partitions. Versions earlier than 8.3.0 support only conversion from hot data to cold data.

If a table has both cold and hot partitions, the query becomes slow because cold data is stored on OBS and the read/write speed are lower than those of local queries.
Currently, cold and hot tables support only column-store partitioned tables of version 2.0. Foreign tables do not support cold and hot partitions.
Only the cold and hot switchover policies can be modified. The tablespace of cold data in cold and hot tables cannot be modified.
Restrictions on partitioning cold and hot tables:
- Data in cold partitions cannot be exchanged.
- MERGE PARTITION supports only the merge of hot-hot partitions and cold-cold partitions.
- Partition operations, such as ADD, MERGE, and SPLIT, cannot be performed on an OBS tablespace.
- Tablespaces of cold and hot table partitions cannot be specified or modified during table creation.
Cold and hot data switchover is not performed immediately upon conditions are met. Data switchover is performed only after users manually, or through a scheduler, invoke the switchover command. Currently, the automatic scheduling time is 00:00 every day and can be modified.
Cold and hot data tables do not support physical fine-grained backup and restoration. Only hot data is backed up during physical backup. Cold data on OBS does not change. The backup and restoration does not support file deletion statements, such as TRUNCATE TABLE and DROP TABLE.

Procedure

This practice takes about 30 minutes. The basic process is as follows:

Creating a cluster.
Using the gsql CLI Client to Connect to a Cluster.
Creating Hot and Cold Tables.
Hot and Cold Data Switchover.
Viewing Data Distribution in Hot and Cold Tables.

Creating a cluster

Log in to the Huawei Cloud management console.
Choose Service List > Analytics > Data Warehouse Service. On the page that is displayed, click Create Cluster in the upper right corner.

Configure the parameters according to Table 1.

**Table 1** Software configuration
Parameter	Configuration
Region	Select the CN-Hong Kong region. NOTE: CN-Hong Kong is used as an example. You can select other regions as required. Ensure that all operations are performed in the same region.
AZ	AZ2
Product	Standard data warehouse
CPU Architecture	x86
Node Flavor	dws2.m6.4xlarge.8 (16 vCPUs \| 128 GB \| 2000 GB SSD) NOTE: If this flavor is sold out, select other AZs or flavors.
Nodes	3
Cluster Name	dws-demo
Administrator Account	dbadmin
Administrator Password	N/A
Confirm Password	N/A
Database Port	8000
VPC	vpc-default
Subnet	subnet-default(192.168.0.0/24)
Security Group	Automatic creation
EIP	Buy now
Bandwidth	1Mbit/s
Advanced Settings	Default

Confirm the information, click Next, and then click Submit.
Wait about 6 minutes. After the cluster is created, click next to the cluster name. On the displayed cluster information page, record the value of Public Network Address, for example, dws-demov.dws.huaweicloud.com.

Using the gsql CLI Client to Connect to a Cluster

Remotely log in to the Linux server where gsql is to be installed as user root, and run the following command in the Linux command window to download the gsql client:

      
         wget https://obs.ap-southeast-1.myhuaweicloud.com/dws/download/dws_client_8.1.x_redhat_x64.zip --no-check-certificate

Decompress the client.

      
         cd <Path_for_storing_the_client> unzip dws_client_8.1.x_redhat_x64.zip

Where,

<Path_for_storing_the_client>: Replace it with the actual path.
dws_client_8.1.x_redhat_x64.zip: This is the client tool package name of RedHat x64. Replace it with the actual name.

Configure the GaussDB(DWS) client.
1

source gsql_env.sh
If the following information is displayed, the gsql client is successfully configured:
1

All things done.
Use the gsql client to connect to a GaussDB(DWS) database (using the password you defined when creating the cluster).
1

gsql -d gaussdb -p 8000 -h 192.168.0.86 -U dbadmin -W password -r
If the following information is displayed, the connection succeeded:
1

gaussdb=>

Creating Hot and Cold Tables

Create a column-store cold and hot data management table lifecycle_table and set the hot data validity period LMT to 100 days.

     
        CREATE TABLE lifecycle_table(i int, val text) WITH (ORIENTATION = COLUMN, storage_policy = 'LMT:100')
PARTITION BY RANGE (i)
(
PARTITION P1 VALUES LESS THAN(5),
PARTITION P2 VALUES LESS THAN(10),
PARTITION P3 VALUES LESS THAN(15),
PARTITION P8 VALUES LESS THAN(MAXVALUE)
)
ENABLE ROW MOVEMENT;

Hot and Cold Data Switchover

Switch hot partition data to cold partition data.

Automatic switchover: The scheduler automatically triggers the switchover at 00:00 every day.

You can use the pg_obs_cold_refresh_time(table_name, time) function to customize the automatic switchover time. For example, set the automatic triggering time to 06:30 every morning.

       
          SELECT * FROM pg_obs_cold_refresh_time('lifecycle_table', '06:30:00');
pg_obs_cold_refresh_time
--------------------------
 SUCCESS
(1 row)

Manual

Run the ALTER TABLE statement to manually switch a single table.

       
          ALTER TABLE lifecycle_table refresh storage;
ALTER TABLE

Use the pg_refresh_storage() function to switch all hot and cold tables in batches.

       
          SELECT pg_catalog.pg_refresh_storage();
 pg_refresh_storage
--------------------
 (1,0)
(1 row)

Convert cold partition data into hot partition data. This function is supported only in 8.3.0 or later.

Convert all cold partitions to hot partitions.

     
        SELECT pg_catalog.reload_cold_partition('lifecycle_table');

Convert a specified cold partition to a hot partition:

     
        SELECT pg_catalog.reload_cold_partition('lifecycle_table', 'cold_partition_name');

Viewing Data Distribution in Hot and Cold Tables

View the data distribution in a single table:

      
         SELECT * FROM pg_catalog.pg_lifecycle_table_data_distribute('lifecycle_table');
schemaname |    tablename    |   nodename   | hotpartition | coldpartition | switchablepartition | hotdatasize | colddatasize | switchabledatasize
------------+-----------------+--------------+--------------+---------------+---------------------+-------------+--------------+--------------------
 public     | lifecycle_table | dn_6001_6002 | p1,p2,p3,p8  |               |                     | 96 KB       | 0 bytes      | 0 bytes
 public     | lifecycle_table | dn_6003_6004 | p1,p2,p3,p8  |               |                     | 96 KB       | 0 bytes      | 0 bytes
 public     | lifecycle_table | dn_6005_6006 | p1,p2,p3,p8  |               |                     | 96 KB       | 0 bytes      | 0 bytes
(3 rows)

View data distribution in all hot and cold tables:

      
         SELECT * FROM pg_catalog.pg_lifecycle_node_data_distribute();
schemaname |    tablename    |   nodename   | hotpartition | coldpartition | switchablepartition | hotdatasize | colddatasize | switchabledatasize
------------+-----------------+--------------+--------------+---------------+---------------------+-------------+--------------+--------------------
 public     | lifecycle_table | dn_6001_6002 | p1,p2,p3,p8  |               |                     |       98304 |            0 |                  0
 public     | lifecycle_table | dn_6003_6004 | p1,p2,p3,p8  |               |                     |       98304 |            0 |                  0
 public     | lifecycle_table | dn_6005_6006 | p1,p2,p3,p8  |               |                     |       98304 |            0 |                  0
(3 rows)