Implementing Automatic DCS Redis Expansion During Peak Hours

Scenarios and Pain Points

Redis is used as cache at ultra-low latency in strong real-time scenarios such as e-commerce, gaming, and social networking. The memory usage and network throughput of a DCS instance surge during peak hours. For example, PV and UV can tens of times on the double 11 festival. Redis must handle 300,000+ QPS from key interfaces such as offering details, shopping cart, flash sales inventory, and sessions.

During peak hours, the key space increases rapidly, and the maximum memory of a DCS Redis instance is approached, which may cause cache eviction and out of memory (OOM). In addition, mass concurrency leads to full inbound and outbound bandwidth. As a result, clients are busy or time out. OOM or network bottleneck may occur if Redis is not scaled out in a timely manner. As a result, orders and page flashes are affected. Conventional manual Redis scaling requires evaluation, instance specification change, and failover, which takes at least 30 minutes. Moreover, O&M requires professions in cloud platform specifications, shard migration, service hotspot distribution, and configuration optimization, which is error-prone.

Solution

Use Distributed Cache Service (DCS) for Redis. Redis instances support automatic memory scale-out and bandwidth adjustment. They function with real-time metrics such as memory usage, network throughput, and number of concurrent connections. Traffic bursts are effectively handled.

When memory or bandwidth usage reaches the threshold, instance specification upgrades or dynamic shard increases can be seamlessly triggered. Bandwidth is also increased to ensure sufficient cache and network channels. For details, see Configuring Automatic Scaling Policies and Setting Automatic Elastic Bandwidth for a DCS Redis Instance. Automatic memory scale-out ensures complete cache and data requests while automatic bandwidth adjustment avoids timeouts and delayed responses caused by network bottlenecks. Redis supports modern heavy-traffic services, such as the double 11 festival, e-commerce flash sales, and game server provisioning. It ensures high availability and utilization of resources at low latency and cost, even in high concurrency.

Notes and Constraints

During scaling, a Redis instance is intermittently disconnected once or twice within 30 seconds, and becomes read-only within 1 minute.
When a Redis instance is automatically scaled out by a condition, it cannot be automatically scaled in temporarily. To manually scale in the instance, see Modifying DCS Instance Specifications.
Performing instance specification changes (except for replica quantity changes) overrides the applied automatic scaling policies.
Available for an instance with the following requirements met: To create one, see Buying a DCS Redis Instance.
- The DCS instance is in the Running state.
- The instance is in Pay-per-use billing mode. Yearly/Monthly instances do not support automatic scaling.
- The instance is of Redis 4.0 or later and basic edition.
- The DCS instance specification must be greater than or equal to 4 GB.
- Submit a ticket, contact customer service to enable functions automatic scaling and automatic bandwidth adjustment (restricted).

Configuring Automatic Scaling Policies

Log in to the DCS console.
Click in the upper left corner of the console and select the region where your instance is located.
In the navigation pane, choose Cache Manager.
On the Cache Manager page, click the name of the DCS instance you want to configure.
On the Basic Information page, click Auto Scaling after Cache Size.

Figure 1 Automatic scaling
On the displayed Auto Scaling dialog box, click Add.
Set the policy name.

4 to 64 characters. Start with a letter or digit and use only letters, digits, hyphens (-), underscores (_), and periods (.).

Select By Condition and set triggering parameters by referring to Table 1.

When memory usage reaches the threshold, the instance specification will be automatically increased.

The example values in this document are for reference only.

**Table 1** Condition parameters
Parameter	Example Value	Description
Avg. Memory Usage ≥	70%	Average memory usage threshold (%). For example, if this parameter is set to 70%, when the memory usage is greater than or equal to 70%, the instance is automatically scaled.
Max. Specification	32GB	Maximum specifications (GB) can be scaled to. Use a value greater than the current. For example, if the instance uses 4 GB memory, and the max. specification is 32 GB, the instance memory will be automatically scaled to 8 GB as triggered, and then to 16 GB as triggered again. As a result, the maximum specification does not exceed 32 GB.
Monitoring Period	5 minutes	Monitoring time of the average memory usage, in minutes. The default value is 5. For example, if the monitoring period is set to 5 minutes, the average memory usage is calculated with 5-minute monitoring data.
Silence	0 second	Interval between scaling operations, in seconds. The default value is 0. An automatically scaled instance will not be scaled again during the silence time if the average memory usage exceeds the threshold again. This mechanism prevents consecutive operations.

Click Confirm. The new policy will be displayed on the Auto Scaling page.

After a policy is submitted, to modify or delete it, click Edit or Delete on the right of the policy.
Click Apply after the policy to be executed. Confirm the operation and click OK. The policy will take effect, and be displayed under Applied Policies.
- To cancel a policy, click Cancel next to the policy and click OK. Applying a new policy replaces the old one. Multiple policies cannot be applied at the same time.
- When automatic scaling is triggered, a specification change record by user auto-system can be viewed on the Background Tasks page on the console, as shown in Figure 2.
  Figure 2 Auto scaling record

Setting Automatic Elastic Bandwidth for a DCS Redis Instance

Log in to the DCS console.
Click in the upper left corner of the console and select the region where your instance is located.
In the navigation pane, choose Cache Manager.
On the Cache Manager page, click the name of the DCS instance you want to configure.
In the Instance Details area of the DCS instance, click Adjust Bandwidth next to Bandwidth.

Figure 3 Adjusting bandwidth
Select Auto scaling.

Enable Auto Bandwidth Increase and set the policies as required, as shown in Table 2.

Bandwidth increases automatically (up to 2,048 Mbit/s per shard) based on scaling policies. Automatic scaling overrides manual adjustments.

Figure 4 Setting auto bandwidth increase policies
Click to enlarge

**Table 2** Setting auto bandwidth increase policies
Policy	Example Value	Description
Burst Bandwidth Usage ≥	70%	Burst bandwidth usage threshold for bandwidth increases. Calculation: Burst bandwidth usage = Burst bandwidth/Shard bandwidth. The larger value between the metrics Output Flow and Input Flow is used for the burst bandwidth usage. Target: When the burst bandwidth usage of an instance shard reaches the threshold, the shard bandwidth is automatically scaled up. As a result, the burst bandwidth usage is reduced to (its threshold minus 10%). For example, when the threshold is 70%, if the burst bandwidth usage of a shard reaches 70%, the bandwidth will be automatically scaled up, and the burst bandwidth usage will decrease to 60%. Therefore, Shard bandwidth after scale-up = Burst bandwidth/60%.
Monitoring Period	1 minute	Monitoring period of bandwidth increases, in minutes. Default: 1 For example, if the monitoring period is set to 1 minute, the bandwidth data is monitored within 1 minute.
Silence	0 second	Interval between increases, in seconds. Default: 0 The silence time avoids consecutive automatic bandwidth increases.

Confirm the bandwidth parameters, check Authorization, and click Submit.

When Automatic bandwidth scaling configured is displayed, the setting is complete. When the bandwidth is automatically adjusted, A change record by user system can be viewed on the Background Tasks page on the console, as shown in Figure 5.

Figure 5 Auto bandwidth adjustment record