Automatic Scaling of Task Nodes in an MRS Cluster
In big data application scenarios, especially real-time data analysis and processing, the number of cluster nodes needs to be dynamically adjusted according to data volume changes to provide the required number of resources.
MRS supports auto scaling for Task nodes based on cluster loads. If the data volume changes periodically, you can configure an auto scaling rule so that the number of task nodes can be automatically adjusted in a fixed period of time before the data volume changes.
- Load-based Task node auto scaling: The number of Task nodes is adjusted based on the real-time cluster load metrics or cluster resource pool metrics. When the data volume changes, scaling is triggered but delayed. For details about the metrics that can be configured, see Node Auto Scaling Metrics.
- Time-based Task node auto scaling: The number of Task nodes is adjusted based on the time ranges. If the data volume changes periodically, you can specify a resource plan to scale in or out the cluster before the data volume changes so as to avoid scaling delay.
- Automation script: In some scaling scenarios, you may need to manually reallocate resources or modify service logic based on the number of nodes after cluster scaling. But MRS allows you to customize automation scripts for auto scaling to solve this problem. Automation scripts can be executed to adjust clusters to service load changes, liberating you from manual operations. In addition, the automation scripts can be fully customized according to various requirements, making the auto scaling more flexible. For details about how to configure automation scripts, see Configuring Bootstrap Actions for an MRS Cluster Node.
You can configure load-based, time-based, or both of Task node auto scaling. It is recommended to use both of the node auto scaling in case of the unexpected data peaks.
Node Auto Scaling Metrics
- Node group dimension policy
When adding a rule, you can refer to Table 1 to configure the corresponding metrics.
- When the value type is percentage or ratio in Table 1, the valid value can be accurate to percentile. The percentage-type metric values omit the percent sign (%). For example, 16.80 indicates 16.80%.
- Hybrid clusters support all indicators of analytics clusters and streaming clusters.
Table 1 Auto scaling metrics Cluster Type
Metric
Value Type
Description
Streaming cluster
StormSlotAvailable
Integer
Number of available Storm slots
Value range: 0 to 2147483646
StormSlotAvailablePercentage
Percentage
Percentage of available Storm slots, that is, the proportion of the available slots
Value range: 0 to 100
StormSlotUsed
Integer
Number of the used Storm slots
Value range: 0 to 2147483646
StormSlotUsedPercentage
Percentage
Percentage of the used Storm slots, that is, the proportion of the used slots
Value range: 0 to 100
StormSupervisorMemAverageUsage
Integer
Average memory usage of the Supervisor process of Storm
Value range: 0 to 2147483646
StormSupervisorMemAverageUsagePercentage
Percentage
Average percentage of the memory used by the Supervisor process of Storm to the total memory of the system
Value range: 0 to 100
StormSupervisorCPUAverageUsagePercentage
Percentage
Average percentage of the CPUs used by the Supervisor process of Storm to the total CPUs
Value range: 0 to 6000
Analysis cluster
YARNAppPending
Integer
Number of pending tasks on YARN
Value range: 0 to 2147483646
YARNAppPendingRatio
Ratio
Ratio of pending tasks on YARN, that is, the ratio of pending tasks to running tasks on YARN
Value range: 0 to 2147483646
YARNAppRunning
Integer
Number of running tasks on YARN
Value range: 0 to 2147483646
YARNContainerAllocated
Integer
Number of containers allocated to YARN
Value range: 0 to 2147483646
YARNContainerPending
Integer
Number of pending containers on YARN
Value range: 0 to 2147483646
YARNContainerPendingRatio
Ratio
Ratio of pending containers on YARN, that is, the ratio of pending containers to running containers on YARN.
Value range: 0 to 2147483646
YARNCPUAllocated
Integer
Number of virtual CPUs (vCPUs) allocated to YARN
Value range: 0 to 2147483646
YARNCPUAvailable
Integer
Number of available vCPUs on YARN
Value range: 0 to 2147483646
YARNCPUAvailablePercentage
Percentage
Percentage of available vCPUs on YARN, that is, the proportion of available vCPUs to total vCPUs
Value range: 0 to 100
YARNCPUPending
Integer
Number of pending vCPUs on YARN
Value range: 0 to 2147483646
YARNMemoryAllocated
Integer
Memory allocated to YARN. The unit is MB.
Value range: 0 to 2147483646
YARNMemoryAvailable
Integer
Available memory on YARN. The unit is MB.
Value range: 0 to 2147483646
YARNMemoryAvailablePercentage
Percentage
Percentage of available memory on YARN, that is, the proportion of available memory to total memory on YARN
Value range: 0 to 100
YARNMemoryPending
Integer
Pending memory on YARN
Value range: 0 to 2147483646
- Resource pool policy
Only the following versions support auto scaling by resource pool:
- Common MRS: MRS 3.1.5 or later
- LTS MRS: MRS 3.3.0-LTS or later
When adding a rule, you can refer to Table 2 to configure the corresponding metrics.
Table 2 Rule configuration description Cluster Type
Metric
Value Type
Description
Analysis/Custom cluster
ResourcePoolMemoryAvailable
Integer
Available memory on YARN in the resource pool. The unit is MB.
Value range: 0 to 2147483646
ResourcePoolMemoryAvailablePercentage
Percentage
Percentage of available memory on YARN in the resource pool, that is, the proportion of available memory to total memory on YARN
Value range: 0 to 100
ResourcePoolCPUAvailable
Integer
Number of available vCPUs on YARN in the resource pool
Value range: 0 to 2147483646
ResourcePoolCPUAvailablePercentage
Percentage
Percentage of available vCPUs on YARN in the resource pool. that is, the proportion of available vCPUs to total vCPUs
Value range: 0 to 100
When adding a resource plan, you can set parameters by referring to Table 3.Table 3 Configuration items of a resource plan Configuration Item
Example Value
Description
Effective On
Monday
The effective date of a resource plan. Daily is selected by default. You can also select one or multiple days from Monday to Sunday.
Time Range
08:00-10:00
Start time and End time of a resource plan are accurate to minutes, with the value ranging from 00:00 to 23:59. For example, if a resource plan starts at 8:00 and ends at 10:00, set this parameter to 8:00-10:00. The end time must be at least 30 minutes later than the start time.
Node Range
4-5
The number of nodes in a resource plan ranges from 0 to 500. In the time range specified in the resource plan, if the number of Task nodes is less than the specified minimum number of nodes, it will be increased to the specified minimum value of the node range at a time. If the number of Task nodes is greater than the maximum number of nodes specified in the resource plan, the auto scaling function reduces the number of Task nodes to the maximum value of the node range at a time. The minimum number of nodes must be less than or equal to the maximum number of nodes.
- When a resource plan is enabled, the Default Range value on the auto scaling page forcibly takes effect beyond the time range specified in the resource plan. For example, if Default Range is set to 1-2, Time Range is between 08:00-10:00, and Node Range is 4-5 in a resource plan, the number of Task nodes in other periods (0:00-8:00 and 10:00-23:59) of a day is forcibly limited to the default node range (1 to 2). If the number of nodes is greater than 2, auto scale-in is triggered; if the number of nodes is less than 1, auto scale-out is triggered.
- When a resource plan is not enabled, the Default Range takes effect in all time ranges. If the number of nodes is not within the default node range, the number of Task nodes is automatically increased or decreased to the default node range.
- Time ranges of resource plans cannot be overlapped. The overlapped time range indicates that two effective resource plans exist at a time point. For example, if resource plan 1 takes effect from 08:00 to 10:00 and resource plan 2 takes effect from 09:00 to 11:00, the time range between 09:00 to 10:00 is overlapped.
- The time range of a resource plan must be on the same day. For example, if you want to configure a resource plan from 23:00 to 01:00 (the next day), configure two resource plans whose time ranges are 23:00-00:00 and 00:00-01:00, respectively.
- Automation scripts
When adding an automation script, you can set related parameters by referring to Table 4.
Table 4 Configuration items of an automation script Configuration Item
Example Value
Description
Name
test
Automation script name.
- The value can contain only digits, letters, spaces, hyphens (-), and underscores (_) and must not start with a space.
- The value can contain 1 to 64 characters.
- A name must be unique in the same cluster. You can set the same name for different clusters.
Script Path
obs://mrs-samples/test.sh
Script path. The value can be an OBS file system path or a local VM path.
- An OBS file system path must start with obs:// and end with .sh, for example, obs://mrs-samples/xxx.sh.
- A local VM path must start with a slash (/) and end with .sh. For example, the path of the example script for installing the Zepelin is /opt/bootstrap/zepelin/zepelin_install.sh.
Execution Node
Master
Select a type of the node where an automation script is executed.
If you select Master nodes, you can choose whether to run the script only on the active Master nodes by enabling or disabling the Active Master switch.
- If you enable it, the script runs only on the active Master nodes.
- If you disable it, the script runs on all Master nodes. This switch is disabled by default.
Parameter
-
Automation script parameter. The following predefined variables can be imported to obtain auto scaling information:
- ${mrs_scale_node_num}: Number of auto scaling nodes. The value is always positive.
- ${mrs_scale_type}: Scale-out/in type. The value can be scale_out or scale_in.
- ${mrs_scale_node_hostnames}: Host names of the auto scaling nodes. Use commas (,) to separate multiple host names.
- ${mrs_scale_node_ips}: IP address of the auto scaling nodes. Use commas (,) to separate multiple IP addresses.
- ${mrs_scale_rule_name}: Name of the triggered auto scaling rule. For a resource plan, this parameter is set to resource_plan.
Executed
Before scale-out
Time for executing an automation script. The following four options are supported: Before scale-out, After scale-out, Before scale-in, and After scale-in.
Assume that the execution nodes include Task nodes.
- The automation script executed before scale-out cannot run on the Task nodes to be added.
- The automation script executed after scale-out can run on the added Task nodes.
- The automation script executed before scale-in can run on Task nodes to be deleted.
- The automation script executed after scale-in cannot run on the deleted Task nodes.
Action upon Failure
Continue
Whether to continue to execute subsequent scripts and scale-out/in after the script fails to be executed.
- You are advised to set this parameter to Continue in the commissioning phase so that the cluster can continue the scale-out/in operation no matter whether the script is executed successfully.
- If the script fails to be executed, view the failure logs in the /var/log/Bigdata/Bootstrap directory on the cluster VM.
- The scale-in operation cannot be rolled back. Therefore, the Action upon Failure can only be set to Continue after scale-in.
The automation script is triggered only during auto scaling. It is not triggered when the cluster node is manually scaled out or in.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot