このページは、お客様の言語ではご利用いただけません。Huawei Cloudは、より多くの言語バージョンを追加するために懸命に取り組んでいます。ご協力ありがとうございました。

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
Help Center/ GaussDB(DWS)/ Best Practices/ Data Development/ Best Practices for Using HStore Tables

Best Practices for Using HStore Tables

Updated on 2024-10-29 GMT+08:00

Working Principles

In GaussDB(DWS), a CU is the smallest unit used to store data in a column-store table. By default, each column in the table stores 60,000 rows of data as a CU. Once generated, the data in a CU cannot be altered. A single CU is generated regardless of whether 1 or 60,000 records are inserted into a column-store table. Frequent small data insertions hinder effective compression, leading to data expansion, which negatively impacts query performance and disk usage.

CU file data can only be appended, not modified. Deleting data marks it as invalid in the dictionary, while updating marks old data as deleted and writes new data to a new CU. Repeated updates or deletions cause space expansion and inefficient space usage.

The column-store Delta table addresses the issue of small CUs from frequent small data imports but does not resolve lock conflicts from concurrent updates on the same CU. A hybrid data warehouse needs to work with data sources, such as upstream databases or applications. Therefore, in real-time import scenarios, concurrent insert, update, and delete operations are necessary for timely data import and high query efficiency.

HStore tables use additional delta tables. Batch-inserted data is written directly to CUs, maintaining the compression benefits of column-store tables. Updated columns and small batch inserts are serialized, compressed, and periodically merged into primary table CUs.

Use Cases

GaussDB(DWS) uses column storage format for HStore tables to minimize disk usage, enable high-concurrency updates, and enhance query performance. HStore tables are ideal for scenarios that demand real-time data import and query capabilities, as well as the ability to process traditional TP transactions.

To enhance performance, GaussDB(DWS) 8.3.0.100 has optimized HStore tables and kept the old ones for compatibility purposes. The optimized tables are known as HStore_opt tables. HStore tables can be replaced by HStore_opt tables for better performance, except in scenarios requiring high performance without micro-batch updates.

Creating HStore Tables and Related Views

To create an HStore table, you need to specify the enable_hstore table-level parameter.
1
 CREATE TABLE test1 (i int,j text) with (orientation = column,enable_hstore=on);
To create an HStore_opt table, you need to specify the enable_hstore_opt table-level parameter.
1
 CREATE TABLE test2 (i int,j text) with (orientation = column,enable_hstore_opt=on);

Check the number of tuples in the Delta table and the expansion status of the Delta table in the view.

1
 SELECT * FROM pgxc_get_hstore_delta_info('tableName');

Use functions to perform lightweight cleanup and full cleanup on the Delta table.

  • After the lightweight merge accumulates 60,000 I records and deletion information on the CU, the level-4 lock ceases to hinder the addition, deletion, modification, and querying of services. Nevertheless, the space is not freed up for the system.
    1
     select hstore_light_merge('tableName'); 
    
  • Merging all records and truncating the Delta table is necessary to free up space for the system. Nonetheless, holding a level-8 lock will impede services.
    1
     select hstore_full_merge('tableName');
    

Insert 100 data records into the HStore table in batches. A record whose type is I (n_i_tup is 1) is generated.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
 CREATE TABLE data(a int primary key, b int);
 NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "data_pkey" for table "data"
 CREATE TABLE

 INSERT INTO data values(generate_series(1,100),1);
 INSERT 0 100

 CREATE TABLE hs(a int primary key, b int)with(orientation=column, enable_hstore=on);
 NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "hs_pkey" for table "hs"
 CREATE TABLE

 INSERT INTO hs SELECT * FROM data;
 INSERT 0 100

 SELECT * FROM pgxc_get_hstore_delta_info('hs'); 
  node_name |      part_name      | live_tup | n_i_type | n_d_type | n_x_type | n_u_type | n_m_type | data_size
 -----------+---------------------+----------+----------+----------+----------+----------+----------+-----------
  dn_1      | non partition table |        1 |        1 |        0 |        0 |        0 |        0 |      8192
 (1 row)

After hstore_full_merge is executed, no tuple exists in the Delta table (the value of live_tup is 0), and the value of data_size is 0.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SELECT hstore_full_merge('hs');
  hstore_full_merge
 -------------------
                  1
 (1 row)
 
SELECT * FROM pgxc_get_hstore_delta_info('hs'); 
  node_name |      part_name      | live_tup | n_i_type | n_d_type | n_x_type | n_u_type | n_m_type | data_size
 -----------+---------------------+----------+----------+----------+----------+----------+----------+-----------
  dn_1      | non partition table |        0 |        0 |        0 |        0 |        0 |        0 |         0
 (1 row)

Perform the deletion. The Delta table contains a record whose type is D (n_d_tup is 1).

1
2
3
4
5
6
7
 DELETE hs where a = 1;
 DELETE 1
SELECT * FROM pgxc_get_hstore_delta_info('hs'); 
  node_name |      part_name      | live_tup | n_i_type | n_d_type | n_x_type | n_u_type | n_m_type | data_size
 -----------+---------------------+----------+----------+----------+----------+----------+----------+-----------
  dn_1      | non partition table |        1 |        0 |        1 |        0 |        0 |        0 |      8192
 (1 row)

Usage Practice

For optimal performance of HStore tables, it is crucial to configure the following parameter settings:

Set autovacuum_max_workers_hstore to 3, autovacuum_max_workers to 6, autovacuum to true, and enable_col_index_vacuum to on.

  1. Concurrent update
    Once a batch of data is inserted into a column-store table, two sessions are initiated. In session 1, a piece of data is deleted, and the transaction is not terminated.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
     CREATE TABLE col(a int , b int)with(orientation=column);
     CREATE TABLE
    
     INSERT INTO col select * from data;
     INSERT 0 100
    
     BEGIN;
     BEGIN
    
     DELETE col where a = 1;
     DELETE 1
    
    When session 2 attempts to delete more data, it becomes evident that session 2 can only proceed after session 1 is committed. This scenario imitates the CU lock issue in column storage.
    1
    2
    3
     BEGIN;
     BEGIN
     DELETE col where a = 2;
    
    Repeat the previous experiment using the HStore table. Session 2 can be executed successfully without any lock wait.
    1
    2
    3
    4
     BEGIN;
     BEGIN
     DELETE hs where a = 2;
     DELETE 1
    
  2. Compression efficiency
    Create a data table with 3 million data records.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
     CREATE TABLE data( a int, b bigint, c varchar(10), d varchar(10));
     
     CREATE TABLE
     INSERT INTO data values(generate_series(1,100),1,'asdfasdf','gergqer');
     INSERT 0 100
     INSERT INTO data select * from data;
     INSERT 0 100
     INSERT INTO data select * from data;
     INSERT 0 200
     
     ---Insert data cyclically until the data volume reaches 3 million.
     
     INSERT INTO data select * from data;
     INSERT 0 1638400
     select count(*) from data;
       count
     ---------
      3276800
     (1 row)
    

    Import data to a row-store table in batches and check whether the size is 223 MB.

    1
    2
    3
    4
    5
    6
    7
    8
    9
     CREATE TABLE row (like data including all);
     CREATE TABLE
     INSERT INTO row select * from data;
     INSERT 0 3276800
     select pg_size_pretty(pg_relation_size('row'));
      pg_size_pretty
     ----------------
      223 MB
     (1 row)
    

    Import data to a column-store table in batches and check whether the size is 3.5 MB.

    1
    2
    3
    4
    5
    6
    7
    8
    9
     CREATE TABLE hs(a int, b bigint, c varchar(10),d varchar(10))with(orientation= column, enable_hstore=on);
     CREATE TABLE
     INSERT INTO hs select * from data;
     INSERT 0 3276800
     select pg_size_pretty(pg_relation_size('hs'));
      pg_size_pretty
     ----------------
      3568 KB
     (1 row)
    

    HStore tables have a good compression effect because of their simple table structure and duplicate data. They are usually compressed three to five times more than row-store tables.

  3. Batch query performance
    It takes approximately four seconds to query the fourth column of the row-store table using the created table.
    1
    2
    3
    4
    5
    6
    7
     explain analyze select d from data;
     explain analye                                                               QUERY PLAN
     -----------------------------------------------------------------------------------------------------------------------------------------
       id |          operation           |        A-time        | A-rows  | E-rows  | Peak Memory  | E-memory | A-width | E-width | E-costs
      ----+------------------------------+----------------------+---------+---------+--------------+----------+---------+---------+----------
        1 | ->  Streaming (type: GATHER) | 4337.881             | 3276800 | 3276800 | 32KB         |          |         |       8 | 61891.00
        2 |    ->  Seq Scan on data      | [1571.995, 1571.995] | 3276800 | 3276800 | [32KB, 32KB] | 1MB      |         |       8 | 61266.00
    
    It takes about 300 milliseconds to query the fourth column of the HStore table.
    1
    2
    3
    4
    5
    6
    7
    8
     explain analyze select d from hs;
                                                                         QUERY PLAN
     ---------------------------------------------------------------------------------------------------------------------------------------------------
       id |               operation                |       A-time       | A-rows  | E-rows  |  Peak Memory   | E-memory | A-width | E-width | E-costs
      ----+----------------------------------------+--------------------+---------+---------+----------------+----------+---------+---------+----------
        1 | ->  Row Adapter                        | 335.280            | 3276800 | 3276800 | 24KB           |          |         |       8 | 15561.80
        2 |    ->  Vector Streaming (type: GATHER) | 111.492            | 3276800 | 3276800 | 96KB           |          |         |       8 | 15561.80
        3 |       ->  CStore Scan on hs            | [111.116, 111.116] | 3276800 | 3276800 | [254KB, 254KB] | 1MB      |         |       8 | 14936.80
    

    Only the batch query scenario has been tested, and in this scenario, the stored tables and HStore tables perform better than row-store tables in terms of query performance.

Requirements and Suggestions for Using HStore Tables

  • Configure the parameters.

    To optimize query performance and compression efficiency, the background thread should perform MERGE clearing on HStore tables. Prior to utilizing HStore tables, ensure that the relevant GUC parameters are configured correctly. The suggested parameter values can be found below.

    autovacuum_max_workers_hstore: 3

    autovacuum_max_workers: 6

    autovacuum: true

    enable_col_index_vacuum: on

  • Suggestions on importing data to the database (The HStore_opt table is recommended.)
    Suggestions for importing the HStore_opt table to the database:
    1. The performance of importing data using UPDATE is poor. You are advised to use UPSERT to import data.
    2. When using DELETE to import data, use index scanning. The JDBC batch method is recommended.
    3. When using UPSERT to import data, enable enable_hstore_nonconflict_upsert_optimization when there is no concurrency conflict and disable it in other scenarios. The optimal path is automatically selected.
    4. Use merge into only when importing over one million data records per DN and there is no concurrent data to prevent duplicate data.
  • Point query suggestion (The HStore_opt table is recommended.)

    Suggestions for querying the HStore_opt table:

    1. Create a level-2 partition on columns with evenly distributed distinct values and frequent equivalent filter criteria. Avoid level-2 partitions on columns with skewed or few distinct values.
    2. When dealing with fixed filter criteria columns (excluding level-2 partitions), use the cbtree index (up to 5 columns).
    3. When dealing with variable filter criteria columns (excluding level-2 partitions), use the GIN index (up to 5 columns).
    4. For all string columns involving equivalent filtering, bitmap indexes can be specified during table creation. The number of columns is not limited, but cannot be modified later.
    5. Specify columns that can be filtered by time range as the partition columns.
    6. If point queries return over 100,000 records per DN, index scanning may outperform non-index scanning. Use the GUC parameter enable_seqscan to compare performance.
  • Index-related

    Indexes occupy extra space and offer limited point query performance improvement. Create primary key or B-tree indexes only for upsert or unique/near-unique point queries.

  • MERGE-related

    HStore tables rely on background autovacuum for MERGE operations. Ensure data import speed does not exceed MERGE speed to prevent delta table expansion. Control import speed by managing concurrent data import tasks. Delta table space reuse is affected by oldestXmin. Old transactions can delay space reuse, causing expansion.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback