Updated on 2024-04-11 GMT+08:00

Cluster Scaling

The processing capability of a big data cluster can be horizontally expanded by adding nodes. If the cluster scale does not meet service requirements, you can manually scale out or scale in the cluster. MRS intelligently selects the node with the least load or the minimum amount of data to be migrated for scale-in. The node to be scaled in will not receive new tasks, and continues to execute the existing tasks. At the same time, MRS copies its data to other nodes and the node is decommissioned. If the tasks on the node cannot be completed after a long time, MRS migrates the tasks to other nodes, minimizing the impact on cluster services.

Scaling Out a Cluster

Currently, you can add Core or Task nodes to scale out a cluster to handle peak service loads. Adding MRS cluster nodes does not affect the services of the existing cluster. For details about how to rectify data skew caused by capacity expansion, see Configuring HDFS DataNode Data Balancing.

Scaling Out a Cluster Charged in Yearly/Monthly Mode

If your service growth rate exceeds the expected value after you subscribe to an MRS cluster charged in Yearly/Monthly mode, cluster scale-out beyond your subscription is required. MRS allows you to scale out clusters charged in Yearly/Monthly mode while enjoying the subscription discounts.

You only need to go to the MRS console page and expand the number of nodes you need. The cluster scale-out process does not require manual intervention and takes only a few minutes, which helps ease pressure on growing service data processing needs.

Scaling In a Cluster

You can reduce the number of Core or Task nodes to scale in a cluster so that MRS delivers better storage and computing capabilities at lower O&M costs based on service requirements. After you scale in an MRS cluster, MRS automatically selects nodes that can be scaled in based on the type of services installed on the nodes.

During the scale-in of Core nodes, data on the original nodes is migrated. If the data location is cached, the client automatically updates the location information, which may affect the latency. Node scale-in may affect the response duration of the first access to some HBase on HDFS data. You can restart HBase or disable or enable related tables to avoid this problem.

Task nodes do not store cluster data. They are compute nodes and do not involve migration of data on the nodes.