Help Center > > User Guide> Managing an Existing Cluster> Manually Scaling In a Cluster

Manually Scaling In a Cluster

Updated at: Mar 25, 2021 GMT+08:00
You can reduce the number of Core or Task nodes to scale in a cluster so that MRS delivers better storage and computing capabilities at lower O&M costs based on service requirements.

Only pay-per-use clusters can be scaled in.

Background

An MRS cluster supports a maximum of 502 nodes. By default, there are one or two Master nodes and one Core node. The total number of Core and Task nodes cannot exceed 500. If more than 500 Core/Task nodes are required, contact HUAWEI CLOUD technical support engineers or invoke a background interface to modify the database.

Core nodes and Task nodes can be reduced, excluding the Master node. To scale in a cluster, you only need to adjust the number of nodes on the MRS console. The MRS then automatically selects the nodes to be scaled in.

The policies for automatically selecting nodes to be removed are as follows:

  • Nodes where basic components (such as ZooKeeper, DBServcie, KrbServer, and LdapServer) are installed cannot be scaled in. MRS does not select these nodes for scale-in. These basic components are the basis for cluster running.
  • Core nodes are used to store cluster service data. When scaling in a cluster, data on the nodes to be deleted must be fully migrated to other nodes. Therefore, perform follow-up operations after cluster scale-in only when all services are decommissioned, such as making nodes exit Manager and deleting ECSs. When selecting Core nodes, preferentially select the nodes with small storage data volume and healthy instances that can be decommissioned to prevent decommissioning failures. For example, if DataNodes are installed on Core nodes in the analysis cluster, the system preferentially selects the nodes with small data volume and good health status during scale-in.

    During the scale-in of Core nodes, data on the original nodes is migrated. If the data location is cached, the client automatically updates the location information, which may affect the latency. Node scale-in may affect the response duration of the first access to some HBase on HDFS data. You can restart HBase or disable or enable related tables to avoid this problem.

  • Task nodes do not store cluster data. They are computing nodes and do not involve node data migration. Therefore, when selecting Task nodes, preferentially select nodes whose health status is faulty, unknown, or subhealthy for scale-in. You can view the health status of these node instances on the Instances page of MRS.

Scale-In Verification Policy

To prevent component decommissioning failures, different components provide different decommissioning restriction rules after nodes are selected for scale-in. Scale-in is allowed only when the restriction rules of all installed components are met. Table 1 describes the scale-in verification policies.

Table 1 Decommissioning restriction rules

Component

Rule

HDFS/DataNode

Rule: If the number of nodes after scale-in is greater than the number of HDFS copies and the total HDFS data volume does not exceed 80% of the total HDFS cluster capacity, you can scale in the cluster.

Reason: This rule ensures that the remaining space is sufficient for storing existing data after the scale-in and reserve some space.

NOTE:

To ensure data reliability, one backup is automatically generated for each file saved in HDFS, that is, two copies are generated in total.

HBase/RegionServer

Rule: The total available memory of RegionServers on all nodes except the nodes to be scaled in is greater than 1.2 times of the memory which is currently used by RegionServers on these nodes.

Reason: When a node is decommissioned, the region on the node is migrated to another node, of which the available memory must be sufficient to bear the region of the decommissioned node.

Kafka/Broker

Rule: The number of nodes after scale-in must be greater than or equal to the maximum number of topic copies. After scale-in, the Kafka disk usage does not exceed 80% of the Kafka disk space of the entire cluster.

Reason: This rule prevents insufficient disk space after the scale-in.

Storm/ Supervisor

Rule: After the scale-in, ensure that the number of slots in the cluster is sufficient for running the submitted tasks.

Reason: This rule prevents no sufficient resources being available for running the stream processing tasks after the scale-in.

Flume/FlumeServer

Rule: If FlumeServer is installed on a node and Flume tasks have been configured for the node, the node cannot be deleted.

Reason: This rule prevents the deployed service program from being deleted by mistake.

Scaling In a Cluster

  1. Log in to the MRS management console.
  2. Choose Clusters > Active Clusters, select a running cluster, and click its name to switch to the cluster details page.
  3. Click the Nodes tab. In the Operation column of the node group, click Scale In. The Scale In page is displayed.

    This operation can be performed only when the cluster is running and all nodes in the cluster are running.

  4. Set Scale-In Nodes, and click OK.

    • Before scaling in the cluster, check whether its security group configuration is correct. Ensure that an inbound security group rule contains a rule in which Protocol & Port is set to All, and Source is set to a trusted accessible IP address range.
    • If damaged data blocks exist in HDFS, the cluster may fail to be scaled in. Contact HUAWEI CLOUD technical support.

  5. A dialog box is displayed in the upper right corner of the page, indicating that the scale-in task is submitted successfully.

    The cluster scale-in process is explained as follows:
    • During scale-in: The cluster status is Scaling In. The submitted jobs will be executed and you can submit new jobs. You are not allowed to continue to scale in or terminate the cluster. You are advised not to restart the cluster or modify the cluster configuration.
    • Successful scale-in: The cluster status is Running. The resources used after node reduction are billed.
    • Failed scale-in: The cluster status is Running when the cluster scale-in failed. You are allowed to execute jobs and scale-in the cluster again.

    After the cluster is scaled in, you can view the node information of the cluster on the Nodes tab page of the cluster details page.

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel