Help Center/ Distributed Database Middleware/ Best Practices/ Selecting a Sharding Key and a Sharding Algorithm
Updated on 2025-08-20 GMT+08:00

Selecting a Sharding Key and a Sharding Algorithm

For details, see Table 1.

Table 1 Sharding keys and algorithms

Sharding Algorithm

Hash

Range

Sharding Key

Table field

Table field+date function

Table field

Table field+date function

Description

Data is evenly distributed to shards by table field.

Data is evenly distributed to shards by table field and date function.

The table field must be date, datetime, or timestamp.

Data is distributed to specific shards based on the rules defined in algorithm metadata.

Data is distributed to shards by table field and date function based on the rules defined in algorithm metadata.

The table field must be date, datetime, or timestamp).

Scenario

Scenarios requiring even data distribution, for example, banking applications where logical entities are customers. In this case, use the table field corresponding to customers (for example, customer account numbers) as the sharding key.

Scenarios requiring data to be split by time (year, month, day, week, or their combinations), for example, gaming applications. For these applications, use the table field (for instance, player registration time) corresponding to players as the sharding key. Sharding by day, month, or year helps you easily collect and query operation statistics of players for a specified day or month, and helps game vendors conduct big data analysis.

Scenarios with a large number of range operations, for instance, e-commerce applications. If a service scenario is focused on promotional activities and logical entities are activity dates, use the table field corresponding to activity dates (for example, activity name and date range) as the sharding key. This helps you collect statistics about the sales volume for a specified cycle.

Scenarios involving many different types of complicated information. For example, for log analysis, you can select the time field as the sharding key and then shard data using the date function.

To make it easier to clear and dump logs, select the range algorithm and convert the time field value into "year" using the date function so that logs are stored in shards by year. For details, see examples in the following passages.

Selecting a Sharding Algorithm

A sharding algorithm partitions data from logical tables to multiple shards. DDM supports hash and range algorithms.

  • Hash

    Hash evenly distributes data across shards.

    Select this algorithm if operators = and IN need to be frequently used in SQL query.

  • Range

    Range stores records in tables based on the range specified in algorithm metadata.

    Select this algorithm if operators greater (>), less (<), and BETWEEN ... AND ... need to be frequently used in SQL queries.

Select an appropriate algorithm based on your service requirements to improve efficiency.

During the scaling out of a schema:

  • Data of logical tables sharded by the hash algorithm is migrated and redistributed evenly across all shards.
  • Data of logical tables sharded by the range algorithm is not migrated by default. If you modify the sharding rule and redefine the range, connect to the target RDS shards to complete data migration.

Selecting a Sharding Key

A sharding key is a table field used to generate a route during horizontal partitioning of logical tables. After specifying a table field, you can select a date function or manually enter date function (field name). The table field must be date, datetime, or timestamp. Select a date function if data needs to be redistributed by year, month, day, week, or some combinations thereof.

DDM calculates routes based on the sharding key and sharding algorithm, horizontally partitions data in sharded tables, and then redistributes it to shards.

Note that when you select a sharding key and a sharding algorithm:
  • Ensure that data is evenly distributed to each shard as much as possible.
  • Select the most frequently used field or the most important query condition as the sharding key.
  • Prioritize the primary key as the sharding key to keep query the fastest.

Service Scenarios with a Clear Entity

A sharded table generally contains tens of millions of data records. It is extremely important to select an appropriate sharding key and a sharding algorithm. If a logical entity is identified and most database operations are performed on data of that entity, select the table field corresponding to the entity as a sharding key for horizontal partitioning.

Logical entities depend on actual applications. The following scenarios each include a clear logical entity.

  1. Banking applications

    The logical entities are customers. In this case, use the table field corresponding to customers (for example, customer account numbers) as the sharding key. Service scenarios of some systems are based on bank cards or accounts. In such cases, select the bank card or account as the sharding key.

  2. E-commerce applications

    If this scenario focuses on products, the logical entities are products. In this case, use the table field corresponding to products (for example, product code) as the sharding key.

  3. Gaming applications

    These applications mainly focus on player data, and the logical entities are players. In this case, use the table field corresponding to players (for example, player ID) as the sharding key.

The following is an example SQL statement for creating tables for bank services.

CREATE TABLE PERSONALACCOUNT (
ACCOUNT VARCHAR(20) NOT NULL PRIMARY KEY,
NAME VARCHAR(60) NOT NULL,
TYPE VARCHAR(10) NOT NULL,
AVAILABLEBALANCE DECIMAL(18,2) NOT NULL,
STATUS CHAR(1) NOT NULL,
CARDNO VARCHAR(24) NOT NULL,
CUSTOMID VARCHAR(15) NOT NULL
) ENGINE=INNODB DEFAULT CHARSET=UTF8;

Service Scenarios Without a Clear Entity

If you cannot identify a suitable entity for your service scenario, select the table field that can provide even data distribution as the sharding key.

For example, the log system may contain a wide range of data records. In this case, you can select the time field as the sharding key.

When the time field is selected as the sharding key, you can specify a date function to partition data.

To make it easier to clear and dump logs, use the range algorithm and convert the time field value into "year" using the date function so that logs are stored in shards by year.

Example SQL statement for creating a table:

CREATE TABLE LOG (
LOGTIME DATETIME NOT NULL,
LOGSOURCESYSTEM VARCHAR(100),
LOGDETAIL VARCHAR(10000)
);