Updated on 2024-10-28 GMT+08:00

Suggestions on Using DCS

Service Usage

Principle

Description

Remarks

Deploy services nearby to reduce latency.

If your service and DCS instance are deployed far from each other (not in the same region) or with a high latency (connected through public networks), the read/write performance will be greatly affected by the latency.

If your service is latency-sensitive, do not create cross-AZ DCS Redis instances.

Separate hot data from cold data.

You can store frequently accessed data (hot data) in Redis, and infrequently accessed data (cold data) in databases such as MySQL and Elasticsearch.

Infrequently accessed data stored in the memory occupies Redis space and does not accelerate access.

Differentiate service data.

Store unrelated service data in different Redis instances.

This prevents services from affecting each other and prevents single instances from being too large. This also enables you to quickly restore services in case of faults.

Do not use the SELECT command for multi-DB on a single instance.

Multi-DB on a single Redis instance does not provide good isolation and is no longer in active development by open-source Redis. You are advised not to depend on this feature in the future.

Set a proper eviction policy.

If the eviction policy is set properly, Redis can still function when the memory is used up unexpectedly.

You can select a policy that meets your service requirements. The default eviction policy used by DCS is volatile-lru.

Use Redis as cache.

Do not over-rely on Redis transactions.

After a transaction is executed, it cannot be rolled back.

If data is abnormal, clear the cache for data restoration.

Redis does not have a mechanism or protocol to ensure strong data consistency. Therefore, services cannot over-rely on the accuracy of Redis data.

When using Redis as cache, set expiration on all keys. Do not use Redis as a database.

Set expiration as required, but a longer expiration is not necessarily better.

Prevent cache breakdown.

Use Redis together with local cache. Store frequently used data in the local cache and regularly update it asynchronously.

-

Prevent cache penetration.

Non-critical path operations are passed through to the database. Limit the rate of access to the database.

-

If the requested data is not found in Redis, read-only DB instances are accessed. You can use domain names to connect to read-only DB instances.

The idea is that the request does not go to the main database.

You can use domain names to connect to multiple read-only DB instances. If a fault occurs, you can add such instances for emergency handling.

Do not use Redis as a message queue.

In pub/sub scenarios, do not use Redis as a message queue.

  • Unless otherwise required, you are not advised to use Redis as a message queue.
  • Using Redis as a message queue causes capacity, network, performance, and function issues.
  • If message queues are required, use Kafka for throughput and RocketMQ for reliability.

Select proper specifications.

If service growth causes increases in Redis requests, use Proxy Cluster or Redis Cluster instances.

Scaling up single-node and master/standby instances only expands the memory and bandwidth, but cannot enhance the computing capabilities.

In production, do not use single-node instances. Use master/standby or cluster instances.

-

Do not use large specifications for master/standby instances.

Redis forks a process when rewriting AOF or running the BGSAVE command. If the memory is too large, responses will be slow.

Prepare for degradation or disaster recovery.

When a cache miss occurs, data is obtained from the database. Alternatively, when a fault occurs, allow another Redis to take over services automatically.

-

Data Design

Category

Principle

Description

Remarks

Keys

Keep the format consistent.

Use the service name or database name as the prefix, followed by colons (:). Ensure that key names have clear meanings.

For example: service name:sub-service name:ID.

Minimize the key length.

Minimize the key length without compromising clarity of the meaning. Abbreviate common words. For example, user can be abbreviated to u, and messages can be abbreviated to msg.

Use up to 128 bytes. The shorter the better.

Do not use special characters except braces ({}).

Do not use special characters such as spaces, line brakes, single or double quotation marks, and other escape characters.

Redis uses braces ({}) to signify hash tags. Braces in key names must be used correctly to avoid unbalanced shards.

Values

Use appropriate value sizes.

Keep the value of a key within 10 KB.

Large values may cause unbalanced shards, hot keys, traffic or CPU usage surges, and scaling or migration failures. These problems can be avoided by proper design.

Use appropriate number of elements in each key.

Do not include too many elements in each Hash, Set, or List. It is recommended that each key contain up to 5000 elements.

Time complexity of some commands, such as HGETALL, is directly related to the quantity of elements in a key. If commands whose time complexity is O(N) or higher are frequently executed and a key has a large number of elements, there may be slow requests, unbalanced shards, or hot keys.

Use appropriate data types.

This saves memory and bandwidth.

For example, to store multiple attributes of a user, you can use multiple keys, such as set u:1:name "X" and set u:1:age 20. To save memory usage, you can also use the HMSET command to set multiple fields to their respective values in the hash stored at one key.

Set appropriate timeout.

Do not set a large number of keys to expire at the same time.

When setting key expiration, add or subtract a random offset from a base expiry time, to prevent a large number of keys from expiring at the same time. Otherwise, CPU usage will be high at the expiry time.

Command Usage

Principle

Description

Remarks

Exercise caution when using commands with time complexity of O(N).

Pay attention to the value of N for commands whose time complexity is O(N). If the value of N is too large, Redis will be blocked and the CPU usage will be high.

For example, the HGETALL, LRANGE, SMEMBERS, ZRANGE, and SINTER commands will consume a large number of CPU resources if there is a large number of elements. Alternatively, you can use SCAN sister commands, such as HSCAN, SSCAN, and ZSCAN commands.

Do not use high-risk commands.

Do not use high-risk commands such as FLUSHALL, KEYS, and HGETALL, or rename them.

For details, see Renaming Commands.

Exercise caution when using the SELECT command.

Redis does not have a strong support for multi-DB. Redis is single-threaded, so databases interfere with each other. You are advised to use multiple Redis instances instead of using multi-DB on one instance.

-

Use batch operations to improve efficiency.

For batch operations, use the MGET command, MSET command, or pipelining to improve efficiency, but do not include a large number of elements in one batch operation.

MGET command, MSET command, and pipelining differ in the following ways:

  • MGET and MSET are atomic operations, while pipelining is not.
  • Pipelining can be used to send multiple commands at a time, while MGET and MSET cannot.
  • Pipelining must be supported by both the server and the client.

Do not use time-consuming code in Lua scripts.

The timeout of Lua scripts is 5s, so avoid using long scripts.

Long scripts: time-consuming sleep statements or long loops.

Do not use random functions in Lua scripts.

When invoking a Lua script, do not use random functions to specify keys. Otherwise, the execution results will be inconsistent between the master and standby nodes, causing data inconsistency.

-

Follow the rules for using Lua on cluster instances.

Follow the rules for using Lua on cluster instances.

  • When the EVAL or EVALSHA command is run, the command parameter must contain at least one key. Otherwise, the client displays the error message "ERR eval/evalsha numkeys must be bigger than zero in redis cluster mode."
  • When the EVAL or EVALSHA command is run, a cluster DCS Redis instance uses the first key to compute slots. Ensure that the keys to be operated are in the same slot.

Optimize multi-key operation commands such as MGET and HMGET with parallel processing and non-blocking I/O.

Some clients do not treat these commands differently. Keys in such a command are processed sequentially before their values are returned in a batch. This process is slow and can be optimized through pipelining.

For example, running the MGET command on a cluster using Lettuce is dozens of times faster than using Jedis, because Lettuce uses pipelining and non-blocking I/O while Jedis does not have a special plan itself. To use Jedis in such scenarios, you need to implement slot grouping and pipelining by yourself.

Do not use the DEL command to directly delete big keys.

Deleting big keys, especially Sets, using DEL blocks other requests.

In Redis 4.0 and later, you can use the UNLINK command to delete big keys safely. This command is non-blocking.

In versions earlier than Redis 4.0:

  • To delete big Hashes, use HSCAN + HDEL commands.
  • To delete big Lists, use the LTRIM command.
  • To delete big Sets, use SSCAN + SREM commands.
  • To delete big Sorted Sets, use ZSCAN + ZREM commands.

SDK Usage

Principle

Description

Remarks

Use connection pools and persistent connections ("pconnect" in Redis terminology).

The performance of short connections ("connect" in Redis terminology) is poor. Use clients with connection pools.

Frequently connecting to and disconnecting from Redis will unnecessarily consume a lot of system resources and can cause host breakdown in extreme cases. Ensure that the Redis client connection pool is correctly configured.

The client must perform fault tolerance in case of faults or slow requests.

The client should have fault tolerance and retry mechanisms in case of master/standby switchover, command timeout, or slow requests caused by network fluctuation or configuration errors.

See Configuring Redis Client Retry.

Set appropriate interval and number of retries.

Do not set the retry interval too short or too long.

  • If the retry interval is very short, for example, shorter than 200 milliseconds, a retry storm may occur, and can easily cause service avalanche.
  • If the retry interval is very long or the number of retries is set to a large value, the service recovery may be slow in the case of a master/standby switchover.

Avoid using Lettuce.

Lettuce is the default client of Spring and stands out in terms of performance. However, Jedis is more stable because it is better at detecting and handling connection errors and network fluctuations. Therefore, Jedis is recommended.

Lettuce has the following problems:

O&M and Management

Principle

Description

Remarks

Use passwords in production.

In production systems, use passwords to protect Redis.

-

Ensure security on the live network.

Do not allow unauthorized developers to connect to redis-server in the production environment.

-

Verify the fault handling capability or disaster recovery logic of the service.

Organize drills in the test environment or pre-production environment to verify service reliability in Redis master/standby switchover, breakdown, or scaling scenarios.

Master/standby switchover can be triggered manually on the console. It is strongly recommended that you use Lettuce for these drills.

Configure monitoring.

Pay attention to the Redis capacity and expand it before overload.

Configure CPU, memory, and bandwidth alarms based on the alarm thresholds.

Perform routine health checks.

Perform routine checks on the memory usage of each node and whether the memory usage of the master nodes is balanced.

If memory usage is unbalanced, big keys exist and need to be split and optimized.

Perform routine analysis on hot keys and check whether there are frequently accessed keys.

-

Perform routine diagnosis on Redis commands and check whether O(N) commands have potential risks.

Even if an O(N) command is not time-consuming, it is recommended that R&D engineers analyze whether the value of N will increase with service growth.

Perform routine analysis on slow query logs.

Detect potential risks based on slow query logs and rectify faults as soon as possible.