Help Center/ GaussDB(DWS)/ Getting Started/ Best Practices of Hot and Cold Data Management

Updated on 2023-08-23 GMT+08:00

View PDF

Best Practices of Hot and Cold Data Management

Scenarios

In massive big data scenarios, with the growing of data, data storage and consumption increase rapidly. The need for data may vary in different time periods, therefore, data is managed in a hierarchical manner, improving data analysis performance and reducing service costs. In some data usage scenarios, data can be classified into hot data and cold data by accessing frequency.

Hot and cold data is classified based on the data access frequency and update frequency.

Hot data: Data that is frequently accessed and updated and requires fast response.
Cold data: Data that cannot be updated or is seldom accessed and does not require fast response

You can define cold and hot management tables to switch cold data that meets the specified rules to OBS for storage. Cold and hot data can be automatically determined and migrated by partition.

Click to enlarge

The hot and cold partitions can be switched based on LMT (Last Modify Time) and HPN (Hot Partition Number) policies. LMT indicates that the switchover is performed based on the last update time of the partition, and HPN indicates that the switchover is performed based on the number of reserved hot partitions.

LMT: Switch the hot partition data that is not updated in the last [day] days to the OBS tablespace as cold partition data. [day] is an integer ranging from 0 to 36500, in days.
HPN: indicates the number of hot partitions to be reserved. During the cold and hot switchover, data needs to be migrated to OBS. HPN is an integer ranging from 0 to 1600.

Constraints

If a table has both cold and hot partitions, the query becomes slow because cold data is stored on OBS and the read/write speed are lower than those of local queries.
Currently, cold and hot tables support only column-store partitioned tables of version 2.0. Foreign tables do not support cold and hot partitions.
Only hot data can be switched to cold data. Cold data cannot be switched to hot data.

Procedure

This practice takes about 30 minutes. The basic process is as follows:

Creating a cluster.
Using the gsql CLI Client to Connect to a Cluster.
Creating Hot and Cold Tables.
Hot and Cold Data Switchover.
View data distribution..

Creating a cluster

Log in to the Huawei Cloud management console.
Choose Service List > Analytics > Data Warehouse Service. On the page that is displayed, click Create Cluster in the upper right corner.

Configure the parameters according to Table 1.

**Table 1** Software configuration
Parameter	Configuration
Region	Select CN-Hong Kong. NOTE: CN-Hong Kong is used as an example. You can select other regions as required. Ensure that all operations are performed in the same region.
AZ	AZ2
Product	Standard data warehouse
CPU Architecture	X86
Node Flavor	dws2.m6.4xlarge.8 (16 vCPUs \| 128 GB \| 2000 GB SSD) NOTE: If this flavor is sold out, select other AZs or flavors.
Nodes	3
Cluster Name	dws-demo
Administrator Account	dbadmin
Administrator Password	-
Confirm Password	-
Database Port	8000
VPC	vpc-default
Subnet	subnet-default(192.168.0.0/24)
Security Group	Automatic creation
EIP	Buy now
Bandwidth	1Mbit/s
Advanced Settings	Default

Confirm the information, click Next, and then click Submit.
Wait about 6 minutes. After the cluster is created, click next to the cluster name. On the displayed cluster information page, record the value of Public Network Address, for example, dws-demov.dws.huaweicloud.com.

Using the gsql CLI Client to Connect to a Cluster

Remotely log in to the Linux server where gsql is to be installed as user root, and run the following command in the Linux command window to download the gsql client:

      
           wget https://obs.ap-southeast-1.myhuaweicloud.com/dws/download/dws_client_8.1.x_redhat_x64.zip --no-check-certificate

Decompress the client.

      
           cd <Path_for_storing_the_client> unzip dws_client_8.1.x_redhat_x64.zip

Where,

<Path_for_storing_the_client>: Replace it with the actual path.
dws_client_8.1.x_redhat_x64.zip: This is the client tool package name of RedHat x64. Replace it with the actual name.

Configure the GaussDB(DWS) client.
1

source gsql_env.sh
If the following information is displayed, the gsql client is successfully configured:
1

All things done.
Use the gsql client to connect to a GaussDB(DWS) database (using the password you defined when creating the cluster).
1

gsql -d gaussdb -p 8000 -h 192.168.0.86 -U dbadmin -W password -r
If the following information is displayed, the connection succeeded:
1

gaussdb=>

Creating Hot and Cold Tables

Create column-store cold and hot tables and set the hot data validity period LMT to 100 days.

     
          CREATE TABLE lifecycle_table(i int, val text) WITH (ORIENTATION = COLUMN, storage_policy = 'LMT:100')
PARTITION BY RANGE (i)
(
PARTITION P1 VALUES LESS THAN(5),
PARTITION P2 VALUES LESS THAN(10),
PARTITION P3 VALUES LESS THAN(15),
PARTITION P8 VALUES LESS THAN(MAXVALUE)
)
ENABLE ROW MOVEMENT;

Hot and Cold Data Switchover

Switch cold data to the OBS tablespace.

Automatic switchover: The scheduler automatically triggers the switchover at 00:00 every day.

The automatic switchover time can be customized. For example, the time can be changed to 06:30 every morning.

       
            select * from pg_obs_cold_refresh_time('lifecycle_table', '06:30:00');
pg_obs_cold_refresh_time
--------------------------
 SUCCESS
(1 row)

Manual

Perform the following operations to manually switch a single table:

       
            alter table lifecycle_table refresh storage;
ALTER TABLE

Perform the following operations to switch over all cold and hot tables in batches:

       
            select pg_catalog.pg_refresh_storage();
 pg_refresh_storage
--------------------
 (1,0)
(1 row)

View data distribution.

View data distribution in hot and cold tables.

View the data distribution in a single table:

     
          select * from pg_catalog.pg_lifecycle_table_data_distribute('lifecycle_table');
schemaname |    tablename    |   nodename   | hotpartition | coldpartition | switchablepartition | hotdatasize | colddatasize | switchabledatasize
------------+-----------------+--------------+--------------+---------------+---------------------+-------------+--------------+--------------------
 public     | lifecycle_table | dn_6001_6002 | p1,p2,p3,p8  |               |                     | 96 KB       | 0 bytes      | 0 bytes
 public     | lifecycle_table | dn_6003_6004 | p1,p2,p3,p8  |               |                     | 96 KB       | 0 bytes      | 0 bytes
 public     | lifecycle_table | dn_6005_6006 | p1,p2,p3,p8  |               |                     | 96 KB       | 0 bytes      | 0 bytes
(3 rows)

View data distribution in all hot and cold tables:

    
         select * from pg_catalog.pg_lifecycle_node_data_distribute();
schemaname |    tablename    |   nodename   | hotpartition | coldpartition | switchablepartition | hotdatasize | colddatasize | switchabledatasize
------------+-----------------+--------------+--------------+---------------+---------------------+-------------+--------------+--------------------
 public     | lifecycle_table | dn_6001_6002 | p1,p2,p3,p8  |               |                     |       98304 |            0 |                  0
 public     | lifecycle_table | dn_6003_6004 | p1,p2,p3,p8  |               |                     |       98304 |            0 |                  0
 public     | lifecycle_table | dn_6005_6006 | p1,p2,p3,p8  |               |                     |       98304 |            0 |                  0
(3 rows)