หน้านี้ยังไม่พร้อมใช้งานในภาษาท้องถิ่นของคุณ เรากำลังพยายามอย่างหนักเพื่อเพิ่มเวอร์ชันภาษาอื่น ๆ เพิ่มเติม ขอบคุณสำหรับการสนับสนุนเสมอมา

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
Software Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Best Practices of Table Design

Updated on 2024-08-20 GMT+08:00

Using Partitioned Tables

Partitioning refers to splitting what is logically one large table into smaller physical pieces based on specific schemes. The table based on the logic is called a partitioned table, and a physical piece is called a partition. Data is stored on these physical partitions, instead of the logical partitioned table. A partitioned table has the following advantages over an ordinary table:

  1. High query performance: You can specify partitions when querying partitioned tables, improving query efficiency.
  2. High availability: If a certain partition in a partitioned table is faulty, data in the other partitions is still available.
  3. Easy maintenance: To fix a partitioned table having a faulty partition, you only need to fix the partition.
GaussDB supports range partitioned tables, list partitioned tables, and hash partitioned tables.
  • Range partitioned table: Data in different ranges is mapped to different partitions. The range is determined by the partition key specified during the partitioned table creation. The partition key is usually a date. For example, sales data is partitioned by month.
  • List partitioned table: Key values contained in the data are stored in different partitions, and the data is mapped to each partition in sequence. The key values contained in the partitions are specified when the partitioned table is created.
  • Hash partitioned table: Data is mapped to each partition based on the internal hash algorithm. The number of partitions is specified when the partitioned table is created.

Selecting a Distribution Mode

In replication mode, full data in a table is copied to each DN in the cluster. This mode is used for tables containing a small volume of data. Full data in a table stored on each DN avoids data redistribution during the join operation. This reduces network costs and plan segment (each having a thread), but generates much redundant data. Generally, this mode is only used for small dimension tables.

In hash mode, hash values are generated for one or more columns. You can obtain the storage location of a tuple based on the mapping between DNs and the hash values. In a hash table, I/O resources on each node can be used during data read/write, which improves the read/write speed of a table. Generally, a table containing a large amount data is defined as a hash table.

Range distribution and list distribution are user-defined distribution policies. Values in a distribution key are within a certain range or fall into a specific value range of the corresponding target DN. The two distribution modes facilitate flexible data management, which, however, requires users equipped with certain data abstraction capability.

Table 1 Distribution policies and application scenarios

Policy

Description

Application Scenario

Hash

Table data is distributed on all DNs in the cluster.

Fact tables containing a large amount of data.

Replication

Full data in a table is stored on every DN in the cluster.

Small tables and dimension tables.

Range

Table data is mapped to specified columns based on the range and distributed to the corresponding DNs.

Users need to customize distribution rules.

List

Table data is mapped to specified columns based on specific values and distributed to corresponding DNs.

Users need to customize distribution rules.

As shown in Figure 1, T1 is a replication table and T2 is a hash table.

Figure 1 Replication tables and hash tables
NOTE:
  • When you insert, modify, or delete data in a replication table, if you use the shippable or immutable function to encapsulate components that cannot be pushed down, data on different DNs in the replication table may be inconsistent.
  • If statements with unstable results, such as window functions, rownum, and limit clauses and user-defined functions, are used to insert data into or modify data in a replication table, data on different nodes may be different.

Table Compression Level

When creating a table, you can customize the compression level and compression ratio of fields. Compression affects not only data loading but also data query. The COMPRESSION parameter specifies the table compression level.

Parameter description:

COMPRESSION specifies the compression level of table data. It determines the compression ratio and time. Generally, the higher the level of compression, the higher the ratio, the longer the time; and the lower the level of compression, the lower the ratio, the shorter the time. The actual compression ratio depends on the distribution mode of table data loaded.

Value range:

  • Valid values for row-store tables are YES and NO, and the default is NO.

You can select different compression levels based on Table 2 in different scenarios.

Table 2 Application scenarios of compression levels

Compression Level

Application Scenario

Storage Model

YES

Enabling table compression: You are advised not to enable this function because the compression ratio of row-store tables is low.

Row store

NO

Disabling table compression.

Row store

Selecting Distribution Keys

Selecting a distribution key for a hash table is essential. Details are as follows:

  1. Ensure that the column values are discrete so that data can be evenly distributed to each DN. You can select the primary key of the table as the distribution key. For example, for a person information table, choose the ID card number column as the distribution key.
  2. With the above principles met, you can select join conditions as distribution keys so that join tasks can be pushed down to DNs, reducing the amount of data transferred between the DNs.

For a hash table, an improper distribution key may cause data skew or poor I/O performance on certain DNs. Therefore, you need to check the table to ensure that data is evenly distributed on each DN. You can run the following SQL statements to check data skew:

1
2
3
4
5
select 
xc_node_id, count(1) 
from tablename 
group by xc_node_id 
order by xc_node_id desc;

Example:

CREATE TABLE t1(c1 int) distribute by hash(c1);
INSERT INTO t1 values(generate_series(1,100));
select xc_node_id, count(1) from t1 group by xc_node_id order by xc_node_id desc;
DROP TABLE t1;

xc_node_id corresponds to a DN. Generally, over 5% difference between the amount of data on different DNs is regarded as data skew. If the difference is over 10%, choose another distribution key.

Multiple distribution keys can be selected in GaussDB to evenly distribute data.

You can select the distribution key of the range or list distributed table as required. In addition to selecting a proper distribution key, pay attention to the impact of distribution rules on data distribution.

Selecting a Data Type

Use the following principles to select efficient data types:

  1. Select data types that facilitate data calculation.

    Generally, the calculation of integers (including common comparison calculations, for example, =, >, <, >=, <=, and !=, as well as GROUP BY) is more efficient than that of strings and floating point numbers.

  2. Select data types with a short length.

    Data types with short length reduce both the data file size and the memory used for computing, improving the I/O and computing performance. For example, use SMALLINT instead of INT, and INT instead of BIGINT.

  3. Use the same data type for a join.

    You are advised to use the same data type for a join. To join columns with different data types, the database needs to convert them to the same type, which leads to additional performance overheads.

Checking a Node Where a Table Resides

When creating a table, you can specify how the table is distributed or replicated among nodes. For details, see DISTRIBUTEBY. For details about distribution modes, see Selecting a Distribution Mode.

When creating a table, you can also set Node Group to specify a group to which the table belongs. For details, see TO{GROUPgroupname|....

You can also view the instance where the table is located.

  1. Query the schema to which the table belongs.
    select t1.nspname,t2.relname from pg_namespace t1,pg_class t2 where t1.oid = t2.relnamespace and t2.relname = 'table1';

    In the preceding command, nspname indicates the name of a schema, relname indicates the name of a table, an index, or a view, oid indicates the row identifier, relnamespace is the OID of the namespace that contains the relationship, and table1 indicates a table name.

  2. Check relname and nodeoids of the table.
    select t1.relname,t2.nodeoids from pg_class t1, pgxc_class t2, pg_namespace t3  where t1.relfilenode =  t2.pcrelid and t1.relnamespace=t3.oid and t1.relname = 'table1' and t3.nspname ='schema1';

    In the preceding command, nodeoids indicates the OID list of the nodes where the table is distributed, relfilenode indicates the name of the file related to the table on the disk, pcrelid indicates the OID of the table, and schema1 indicates the schema of the table queried in step 1.

  3. Query the instance where the table is located based on the queried node where the table is distributed.
    select * from pgxc_node where oid in (nodeoids1, nodeoids2, nodeoids3);

    In the preceding command, nodeoids1, nodeoids2, nodeoids3 indicates the three nodeoids queried in step 2. Use the actual nodeoids and separate them with commas (,).

เราใช้คุกกี้เพื่อปรับปรุงไซต์และประสบการณ์การใช้ของคุณ การเรียกดูเว็บไซต์ของเราต่อแสดงว่าคุณยอมรับนโยบายคุกกี้ของเรา เรียนรู้เพิ่มเติม

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback