Formulating Sharding Rules

Updated on 2022-08-01 GMT+08:00

View PDF

If a relationship exists between entities on different tables, formulate the same sharding rule for these tables, and select the associated table fields as the sharding keys respectively so that associated data in different tables is stored in the same shard to avoid cross-shard JOIN operations. For example, use the customer ID as the sharding key when creating sharded tables for storing customer information, orders, or order details.

**Table 1** Sharding keys and algorithms
Sharding Algorithm	Hash		Range
Sharding Key	Table field	Table field+date function	Table field	Table field+date function
Description	Data is evenly distributed to shards by table field.	Data is evenly distributed to shards by table field and date function. The table field must be date, datetime, or timestamp.	Data is distributed to a specific shard based on the rules defined in algorithm metadata.	Data is distributed to shards by table field and date function based on the rules defined in algorithm metadata. The table field must be date, datetime, or timestamp.
Application Scenarios	Scenarios requiring even data distribution, for example, banking applications where logical entities are customers. In this case, use the table field corresponding to customers (for example, customer account numbers) as the sharding key.	Scenarios requiring data to be split by time (year, month, day, week, or their combinations), for example, gaming applications. For these applications, use the table field (for instance, player registration time) corresponding to players as the sharding key. Sharding by day, month, or year helps you easily collect and query operation statistics of players for a specified day or month, and helps game vendors conduct big data analysis.	Scenarios with a large number of range operations, for instance, e-commerce applications. If a service scenario is focused on promotional activities and logical entities are activity dates, use the table field corresponding to activity dates (for example, activity name and date range) as the sharding key. This helps you collect statistics about the sales volume for a specified cycle.	Scenarios involving many different types of complicated information. For example, for log analysis, you can select the time field as the sharding key and then shard data using the date function. To make it easier to clear and dump logs, select the range algorithm and convert the time field value into "year" using the date function so that logs are stored in shards by year. For details, see examples in the following passages.

Selecting a Sharding Algorithm

A sharding algorithm partitions data from logical tables to multiple shards. DDM supports hash and range algorithms.

Hash
Hash evenly distributes data across shards.

Select this algorithm if operators = and IN need to be frequently used in SQL queries.
Range
Range stores records in tables based on the range specified in algorithm metadata.

Select this algorithm if operators greater (>), less (<), and BETWEEN ... AND ... need to be frequently used in SQL queries.

CAUTION:

If the sharding algorithm is a range algorithm and a DATE function and the sharding key field indicates the creation time, hotspot issues may occur when data is imported to the database. As a result, the advantages of multiple MySQL databases cannot be fully utilized.

Select an appropriate algorithm based on your service requirements to improve efficiency.

Selecting a Sharding Key

A sharding key is a table field used to generate a route during horizontal partitioning of logical tables. After specifying a table field, you can select a date function or manually enter date function (field name). The table field must be date, datetime, or timestamp. Select a date function if data needs to be redistributed by year, month, day, week, or some combinations thereof.

DDM calculates routes based on the sharding key and sharding algorithm, horizontally partitions data in sharded tables, and then redistributes it to shards.

Note that when you select a sharding key and a sharding algorithm:

Ensure that data is evenly distributed to each shard as much as possible.
Select the most frequently used field or the most important query condition as the sharding key.
Prioritize the primary key as the sharding key to keep query the fastest.

Service Scenarios with a Clear Entity

A sharded table generally contains tens of millions of data records. It is extremely important to select an appropriate sharding key and a sharding algorithm. If a logical entity is identified and most database operations are performed on data of that entity, select the table field corresponding to the entity as a sharding key for horizontal partitioning.

Logical entities depend on actual applications. The following scenarios each include a clear logical entity.

For customer-related applications of banks, the service logical entities are customers. In this case, use the table field corresponding to customers (for example, customer numbers) as the sharding key. Service scenarios of some systems are based on bank cards or accounts. In such cases, select the bank card or account as the sharding key.
For e-commerce applications, if service scenarios are based on products, the service logical entity is products. In this case, use the table field corresponding to products (for example, product code) as the sharding key.
Game applications mainly focus on player data, and the service logical entity is players. In this case, use the table field corresponding to players (for example, player ID) as the sharding key.

The following is an example SQL statement for creating tables for bank services:

CREATE TABLE PERSONALACCOUNT(
	ACCOUNT VARCHAR(20) NOT NULL PRIMARY KEY,
	NAME VARCHAR(60) NOT NULL,
	TYPE VARCHAR(10) NOT NULL,
	AVAILABLEBALANCE DECIMAL(18, 2) NOT NULL,
	STATUS CHAR(1) NOT NULL,
	CARDNO VARCHAR(24) NOT NULL,
	CUSTOMID VARCHAR(15) NOT NULL
) ENGINE = INNODB DEFAULT CHARSET = UTF8
dbpartition by hash(ACCOUNT);

Service Scenarios Without a Clear Entity

If you cannot identify a suitable entity for your service scenario, select the table field that can provide even data distribution as the sharding key.

For example, the log system may contain a wide range of data records. In this case, you can select the time field as the sharding key.

When the time field is selected as the sharding key, you can specify a date function to partition data.

To make it easier to clear and dump logs, select the range algorithm and convert the time field value into "month" using the date function so that logs are stored in shards by month.

Example SQL statement for creating a table:

CREATE TABLE LOG(
	LOGTIME DATETIME NOT NULL,
	LOGSOURCESYSTEM VARCHAR(100),
	LOGDETAIL VARCHAR(10000)
)
dbpartition by range(month(LOGTIME)) {
	1 - 2 = 0,
	3 - 4 = 1,
	5 - 6 = 2,
	7 - 8 = 3,
	9 - 10 = 4,
	11 - 12 = 5,
	default = 0
};

Previous topic: Overview

Next topic: Determining the Number of Shards in a Schema