Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
Software Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Formulating Sharding Rules

Updated on 2022-08-01 GMT+08:00

If a relationship exists between entities on different tables, formulate the same sharding rule for these tables, and select the associated table fields as the sharding keys respectively so that associated data in different tables is stored in the same shard to avoid cross-shard JOIN operations. For example, use the customer ID as the sharding key when creating sharded tables for storing customer information, orders, or order details.

Table 1 Sharding keys and algorithms

Sharding Algorithm

Hash

Range

Sharding Key

Table field

Table field+date function

Table field

Table field+date function

Description

Data is evenly distributed to shards by table field.

Data is evenly distributed to shards by table field and date function.

The table field must be date, datetime, or timestamp.

Data is distributed to a specific shard based on the rules defined in algorithm metadata.

Data is distributed to shards by table field and date function based on the rules defined in algorithm metadata.

The table field must be date, datetime, or timestamp.

Application Scenarios

Scenarios requiring even data distribution, for example, banking applications where logical entities are customers. In this case, use the table field corresponding to customers (for example, customer account numbers) as the sharding key.

Scenarios requiring data to be split by time (year, month, day, week, or their combinations), for example, gaming applications. For these applications, use the table field (for instance, player registration time) corresponding to players as the sharding key. Sharding by day, month, or year helps you easily collect and query operation statistics of players for a specified day or month, and helps game vendors conduct big data analysis.

Scenarios with a large number of range operations, for instance, e-commerce applications. If a service scenario is focused on promotional activities and logical entities are activity dates, use the table field corresponding to activity dates (for example, activity name and date range) as the sharding key. This helps you collect statistics about the sales volume for a specified cycle.

Scenarios involving many different types of complicated information. For example, for log analysis, you can select the time field as the sharding key and then shard data using the date function.

To make it easier to clear and dump logs, select the range algorithm and convert the time field value into "year" using the date function so that logs are stored in shards by year. For details, see examples in the following passages.

Selecting a Sharding Algorithm

A sharding algorithm partitions data from logical tables to multiple shards. DDM supports hash and range algorithms.

  • Hash

    Hash evenly distributes data across shards.

    Select this algorithm if operators = and IN need to be frequently used in SQL queries.

  • Range

    Range stores records in tables based on the range specified in algorithm metadata.

    Select this algorithm if operators greater (>), less (<), and BETWEEN ... AND ... need to be frequently used in SQL queries.

    CAUTION:

    If the sharding algorithm is a range algorithm and a DATE function and the sharding key field indicates the creation time, hotspot issues may occur when data is imported to the database. As a result, the advantages of multiple MySQL databases cannot be fully utilized.

Select an appropriate algorithm based on your service requirements to improve efficiency.

Selecting a Sharding Key

A sharding key is a table field used to generate a route during horizontal partitioning of logical tables. After specifying a table field, you can select a date function or manually enter date function (field name). The table field must be date, datetime, or timestamp. Select a date function if data needs to be redistributed by year, month, day, week, or some combinations thereof.

DDM calculates routes based on the sharding key and sharding algorithm, horizontally partitions data in sharded tables, and then redistributes it to shards.

Note that when you select a sharding key and a sharding algorithm:

  • Ensure that data is evenly distributed to each shard as much as possible.
  • Select the most frequently used field or the most important query condition as the sharding key.
  • Prioritize the primary key as the sharding key to keep query the fastest.

Service Scenarios with a Clear Entity

A sharded table generally contains tens of millions of data records. It is extremely important to select an appropriate sharding key and a sharding algorithm. If a logical entity is identified and most database operations are performed on data of that entity, select the table field corresponding to the entity as a sharding key for horizontal partitioning.

Logical entities depend on actual applications. The following scenarios each include a clear logical entity.

  1. For customer-related applications of banks, the service logical entities are customers. In this case, use the table field corresponding to customers (for example, customer numbers) as the sharding key. Service scenarios of some systems are based on bank cards or accounts. In such cases, select the bank card or account as the sharding key.
  2. For e-commerce applications, if service scenarios are based on products, the service logical entity is products. In this case, use the table field corresponding to products (for example, product code) as the sharding key.
  3. Game applications mainly focus on player data, and the service logical entity is players. In this case, use the table field corresponding to players (for example, player ID) as the sharding key.

The following is an example SQL statement for creating tables for bank services:

CREATE TABLE PERSONALACCOUNT(
	ACCOUNT VARCHAR(20) NOT NULL PRIMARY KEY,
	NAME VARCHAR(60) NOT NULL,
	TYPE VARCHAR(10) NOT NULL,
	AVAILABLEBALANCE DECIMAL(18, 2) NOT NULL,
	STATUS CHAR(1) NOT NULL,
	CARDNO VARCHAR(24) NOT NULL,
	CUSTOMID VARCHAR(15) NOT NULL
) ENGINE = INNODB DEFAULT CHARSET = UTF8
dbpartition by hash(ACCOUNT);

Service Scenarios Without a Clear Entity

If you cannot identify a suitable entity for your service scenario, select the table field that can provide even data distribution as the sharding key.

For example, the log system may contain a wide range of data records. In this case, you can select the time field as the sharding key.

When the time field is selected as the sharding key, you can specify a date function to partition data.

To make it easier to clear and dump logs, select the range algorithm and convert the time field value into "month" using the date function so that logs are stored in shards by month.

Example SQL statement for creating a table:

CREATE TABLE LOG(
	LOGTIME DATETIME NOT NULL,
	LOGSOURCESYSTEM VARCHAR(100),
	LOGDETAIL VARCHAR(10000)
)
dbpartition by range(month(LOGTIME)) {
	1 - 2 = 0,
	3 - 4 = 1,
	5 - 6 = 2,
	7 - 8 = 3,
	9 - 10 = 4,
	11 - 12 = 5,
	default = 0
};

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback