Esta página ainda não está disponível no idioma selecionado. Estamos trabalhando para adicionar mais opções de idiomas. Agradecemos sua compreensão.

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

TABLE Object Design (Prioritized)

Updated on 2024-12-19 GMT+08:00

Rule 2.9: Selecting the Optimal Distribution Method and Columns During Table Creation

NOTE:

Impact of rule violation:

  • Incorrect distribution method and column selection can cause storage skew, deteriorate access performance, and even overload storage and computing resources.

Solution:

  • When creating a table, it is important to use the DISTRIBUTE BY clause to explicitly specify the distribution method and distribution columns. The table below provides principles for selecting the distribution columns.
Table 1 Distribution column selection

Distribution Method

Description

Scenario

Hash

Table data is distributed to each DN based on the mapping between hash values generated by distribution columns and DNs.

  • Advantage: Each DN contains only part of data, which is space-saving.
  • Disadvantage: The even distribution of data depends heavily on the selection of distribution columns. If the join condition does not include the distribution columns of each node, data communication between nodes will be required.

Large tables and fact tables

RoundRobin

Table data is distributed to DNs in polling mode.

  • Advantage: Each DN only contains a portion of the data, taking up a small amount of space. Data is evenly distributed in polling mode and does not rely on distribution columns, eliminating data skews.
  • Disadvantage: Using distribution column conditions cannot eliminate or reduce inter-node communication. In this scenario, the performance is inferior to that of HASH.

Large tables, fact tables, and tables without proper distribution columns

Replication

Full data in a table is copied to each DN in the cluster.

  • Advantage: Each DN holds the complete data of the table. The JOIN operation avoids data communication between nodes, reducing network overhead and the overhead of starting and stopping the STREAM thread.
  • Disadvantage: Each DN retains complete table data, which is redundant and occupies more storage space.

Small tables and dimension tables

Rule 2.10 Selecting an Optimal Storage Type During Table Creation

NOTE:

Impact of rule violation:

  • Row-store tables are not properly used. As a result, the query performance is poor and resources are overloaded.

  • Improper use of column-store tables causes CU expansion, poor performance, and resource overload.

Solution:

  • When creating a table, use orientation to explicitly specify the storage type. The following table describes the rules for selecting a storage type.
Table 2 Storage type selection

Storage Type

Applicable Scenario

Inapplicable Scenario

Row storage

  • DML addition, deletion, and modification: scenarios with many UPDATE and DELETE operations
  • DML query: point query (simple index–based query that returns only a few records)

DML query: statistical analysis query (with mass data involved in GROUP and JOIN processes)

CAUTION:

When creating a row-store table (orientation is set to row), do not specify the compress attribute or use a row-store compressed table.

Column storage

  • DML addition, deletion, and modification: INSERT batch import scenario (The number of data records imported to a single partition at a time is approximately 60,000 times the number of DNs or greater.)
  • DML query: statistical analysis query (with mass data involved in GROUP and JOIN processes)
  • DML addition, deletion, and modification: scenarios with many UPDATE/DELETE operations or a small number of INSERT operations
  • DML query: high-concurrency point query

Rule 2.11 Selecting an Optimal Partitioning Policy During Table Creation

NOTE:

Impact of rule violation:

Without partitioning, query performance and data governance efficiency will deteriorate. The larger the data volume, the greater the deterioration. The advantages of partitioning include:

  • High query performance: The system queries only the concerned partitions rather than the whole table, improving the query efficiency.
  • Improved data governance efficiency: In the data lifecycle management scenario, performing TRUNCATE or DROP PARTITION on historical partitions is much more efficient and effective than using DELETE.

Solution:

  • Design partitions for tables that contain fields of the time type.
Table 3 Partitioning policy selection

Partitioning Policy

Description

Scenario

Range partitioning

Data is stored in different partitions based on the range of partition key values. The partition key ranges are consecutive but not overlapped.

  1. The date or time field is used as the partition key.
  2. Most queries contain partition keys as filter criteria.
  3. Periodically delete data based on the partition key.

List partitioning

Partitioning is performed based on a unique list of partition key values.

  1. A specific number of enumerated values are used as partition key values.
  2. Most queries contain partition keys as filter criteria.

Suggestion 2.12: Designing Table Columns for Fast and Accurate Queries

NOTE:

Impact of rule violation:

  • The system may have limited storage space and low query efficiency.

Solution:

  1. Design the table columns well for fast queries.
    • If possible, use integers instead of floating points or characters.
    • When using variable-length character type, specify the maximum length based on data features.
  2. Design the table columns well for accurate queries.
    • Use the time type instead of the character type to store time data.
    • Use the minimum numeric type that meets the requirements. Avoid using bigint if int or smallint is sufficient to save space.
  3. Correctly use the constraints.
    • Add NOT NULL constraints to columns that never have NULL values. The optimizer automatically optimizes the columns in certain scenarios.
    • Do not use the DEFAULT constraint for fields that can be supplemented at the service layer. Otherwise, unexpected results may be generated during data loading.
  4. Avoid unnecessary data type conversion.
    • In tables that are logically related, columns having the same meaning should use the same data type.
    • Different types of comparison operations cause data type conversion, which may cause index and partition pruning failures and affect query performance.

Suggestion 2.13: Avoiding the Usage of Auto-increment Columns or Data Types

NOTE:

Impact of rule violation:

  • When auto-increment sequences or data types are heavily used, the GTM may become overloaded and slow down sequence generation.

Solution:

  • Set a UUID to obtain a unique ID.
  • If the auto-increment sequence must be used and there is no strict requirement for increasing order, you can set the cache, for example, 1000, to reduce the pressure on GTM.

Usamos cookies para aprimorar nosso site e sua experiência. Ao continuar a navegar em nosso site, você aceita nossa política de cookies. Saiba mais

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback