Help Center > > User Guide> Configuring a Cluster> Creating a Cluster

Creating a Cluster

Updated at:Sep 29, 2020 GMT+08:00

The first step of using MRS is to buy a cluster. This section describes how to create a cluster on the MRS management console.

You can create an IAM user or user group on the IAM management console and grant it specific operation permissions, to perform refined resource management on HUAWEI CLOUD, after registering a HUAWEI CLOUD account. For details, see Permission Management.

Billing Mode

The commercial version of MRS is charged based on ECSs in the cluster.
  • Yearly/Monthly: You can pay for clusters by year or month. The minimum cluster duration is 1 month and the maximum available cluster duration is 1 year. You get a greater discount if you purchase a longer period.
  • Pay-per-use: Nodes are billed by actual duration of use, with a billing cycle of one hour.
  • The preceding formula is only used to calculate the fee for purchasing a cluster. Data storage, bandwidth and traffic on MRS are billed additionally.
  • You will be notified of renewal if there is no sufficient balance for fee deduction. Cluster resources will be frozen during a retention period and unfrozen after your renewal.
  • A yearly/monthly cluster cannot be restored and its fees cannot be refunded after being deleted. Exercise caution when deleting a yearly/monthly cluster.
  • If your account is in arrears, you can still use the cluster but cannot use the pay-per-use service. That is, you cannot submit jobs through OBS.

Creating an MRS 2.1.0 Cluster

  1. Log in to the MRS management console.
  2. Click Buy Cluster. The page for buying a cluster is displayed.

    When creating a cluster, pay attention to quota notification. If a resource quota is insufficient, increase the resource quota as prompted and create a cluster.

  3. Configure basic cluster information by referring to the following table.

    Table 1 Basic cluster configuration information



    Billing Mode

    MRS provides two billing modes.
    • Yearly/Monthly
    • Pay-per-use


    Select a region.


    An AZ is a physical area that uses independent power and network resources. AZs are physically isolated but interconnected through the internal network. This improves the availability of applications. You are advised to create clusters in different AZs.

    Select an AZ associated with the cluster region. Select a region from the tool menu.

    Cluster Name

    Cluster name, which is globally unique.

    A cluster name can contain only 1 to 64 characters. Only letters, digits, hyphens (-), and underscores (_) are allowed.

    The default name is mrs_xxxx. xxxx is a random collection of letters and digits.

    Cluster Version

    Currently, MRS 1.8.10, MRS 1.9.2, MRS 1.8.10, MRS 1.9.2, and MRS 2.1.0 are supported. MRS 2.1.0 is supported. The latest version of MRS is used by default.

    Enterprise Project

    Select the enterprise project to which a cluster belongs. To use an enterprise project, create one on the Enterprise Project Management page of the Enterprise Management console.

    The Enterprise Management console of the enterprise project is designed for resource management. It helps enterprises manage cloud-based personnel, resources, permissions, and finance in a hierarchical manner, such as management of companies, departments, and projects.

    Kerberos Authentication

    Whether to enable Kerberos authentication when logging in to MRS Manager.

    • : If Kerberos Authentication is disabled, you can use all functions of an MRS cluster. You are advised to disable Kerberos authentication in single-user scenarios. If Kerberos authentication is disabled, you can follow instructions in Security Configuration Suggestions for Clusters with Kerberos Authentication Disabled to perform security configuration.
    • : If Kerberos Authentication is enabled, common users cannot use the file and job management functions of an MRS cluster and cannot view cluster resource usage or the job records for Hadoop and Spark. To use more cluster functions, the users must contact the MRS Manager administrator to assign more permissions. You are advised to enable Kerberos authentication in multi-user scenarios.

    You can click or to disable or enable Kerberos authentication, respectively.


    Name of the administrator of MRS Manager. admin is used by default.


    Password of the MRS Manager administrator.

    A password must meet the following requirements:

    • Must contain 8 to 32 characters.
    • Must contain at least three of the following:
      • Lowercase letters
      • Uppercase letters
      • Digits
      • Special characters: `~!@#$%^&*()-_=+\|[{}];:'",<.>/?
      • Spaces
    • Must be different from the username.
    • Cannot be the same as the username spelled backwards.

    Password Strength: The colorbar in red, orange, and green indicates weak, medium, and strong password, respectively.

    Confirm Password

    Enter the password of the MRS Manager administrator again.

    Cluster Type

    There are three types of clusters:
    • Analysis cluster: is used for offline data analysis and provides Hadoop components.
    • Streaming cluster: is used for streaming tasks and provides stream processing components.
    • Hybrid cluster: is used for both offline data analysis and streaming processing and provides Hadoop components and streaming processing components. You are advised to use a hybrid cluster to perform offline data analysis and streaming processing tasks at the same time. (MRS 1.8.5 or later supports hybrid clusters.)

    MRS streaming clusters do not support job and file management functions.


    The following table lists the components of MRS 2.1.0.

    Components of an analysis cluster:
    • Presto 308: open source and distributed SQL query engine
    • Hadoop 3.1.1: distributed system architecture
    • Spark 2.3.2: in-memory distributed computing framework
    • Hive 3.1.0: data warehouse framework built on Hadoop
    • HBase 2.1.1: distributed column-oriented database
    • Tez 0.9.1: an application framework which allows for a complex directed-acyclic-graph of tasks for processing data
    • Hue 3.11.0: provides the Hadoop UI capability, which enables users to analyze and process Hadoop cluster data on browsers
    • Loader 2.0.0: a tool based on source Sqoop 1.99.7, designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases

      Hadoop is mandatory, and Spark and Hive must be used together. Select components based on service requirements.

    • Flink: a distributed big data processing engine that can perform stateful computations over both finite and infinite data streams
    • Impala: an SQL query engine for processing huge volumes of data
    • Kudu: a column-oriented data store
    Components of a streaming cluster:
    • Kafka 1.1.0: distributed message subscription system
    • Storm 1.2.1: distributed real-time computing system
    • Flume 1.6.0: distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data

    Use External Data Sources to Store Metadata

    Whether to use external data sources to store metadata. Click to enable this function. If this function is enabled, metadata will not be affected if a cluster is abnormal or deleted. This function applies to scenarios where storage and computing are separated.

    MRS 1.9.2 or later supports this function.

    Data Connection Type

    This parameter is valid only when Use External Data Sources to Store Metadata is enabled. It indicates the type of an external data source.

    • Hive supports the following data connection types:
      • RDS PostgreSQL database
      • Local database
    • Ranger supports the following data connection types:
      • RDS MySQL database
      • Local database

    Data Connection Instance

    This parameter is valid only when Data Connection Type is set to RDS PostgreSQL database or RDS MySQL database. This parameter indicates the name of the connection between the MRS cluster and the RDS database. This instance must be created before being referenced here. You can click Create Data Connection to create a data connection. For details, see Managing Data Connections.


    A VPC is a secure, isolated, logical network environment.

    Select the VPC for which you want to create a cluster and click View VPC to view the name and ID of the VPC. If no VPC is available, create one.


    A subnet provides dedicated network resources that are isolated from other networks, improving network security.

    Select the subnet for which you want to create a cluster to enter the VPC and view the name and ID of the subnet. If no subnet is created under the VPC, click Create Subnet to create one.


    Do not associate the subnet with the network ACL.

    Security Group

    A security group is a set of ECS access rules. It provides access policies for ECSs that have the same security protection requirements and are mutually trusted in a VPC.

    When you create an MRS cluster, you can select Auto create from the drop-down list of Security Group to create a security group or select an existing security group.


    You are advised to select Auto Create. If you select an existing security group, you can select the security group automatically created during cluster creation or use the security group created by yourself. If you use the security group created by yourself, ensure that the inbound rule contains all protocols, all ports, and the source IP address is the IP address of the specified management plane node, for details, contact HUAWEI CLOUD technical support. Do not use as the source IP address to prevent security risks.


    After binding an EIP to an MRS cluster, you can use the EIP to access MRS Manager of the cluster.

    When creating a cluster, you can select an available EIP from the drop-down list and bind it. If no EIP is available in the drop-down list, click Manage EIP to access the EIPs service page to buy one.


    The EIP must be in the same region as the cluster.

    Cluster HA

    Whether to enable high availability for a cluster. This parameter is enabled by default.

    If you enable this option, the management processes of all components will be deployed on both Master nodes to achieve hot standby and prevent single-node failure, improving reliability. If you disable this option, they will be deployed on only one Master node. As a result, if a process of a component becomes abnormal, the component will fail to provide services.

    • : Disabled. When Cluster HA is disabled, there is only one Master node and the number of Core nodes is three by default. However, you can decrease the number of Core nodes to 1.
    • : Enabled. When Cluster HA is enabled, there are two Master nodes and the number of Core nodes is three by default. However, you can decrease the number of Core nodes to 1.

    You can click or to disable or enable high availability, respectively.

    CPU Architecture

    CPU architectures supported by MRS:

    • x86: The x86-based CPU architecture uses Complex Instruction Set Computing (CISC). Each instruction can be used to execute low-level hardware operations. The number of instructions is large, and the length of each instruction is different. Therefore, executing such an instruction is complex and time-consuming.
    • Kunpeng: The Kunpeng-based CPU architecture uses Reduced Instruction Set Computing (RISC). RISC is a microprocessor that executes fewer types of computer instructions but at a higher speed than CISC. RISC simplifies the computer architecture and improves the running speed. Compared with the x86-based CPU architecture, the Kunpeng-based CPU architecture has a more balanced performance and power consumption ratio. Kunpeng features high density, low power consumption, high cost-effectiveness.
    Table 2 Cluster node information



    Node Type

    MRS provides three types of nodes:

    • Master: A Master node in an MRS cluster manages the cluster, assigns executable cluster files to Core nodes, traces the execution status of each job, and monitors the DataNode running status.
    • Core: A Core node in a cluster processes data and stores process data in HDFS. Analysis Core nodes are created in an analysis cluster. Streaming Core nodes are created in a streaming cluster. Both analysis and streaming Core nodes are created in a hybrid cluster.
    • Task: A Task node in a cluster is used for computing and does not store persistent data. Yarn and Storm are mainly installed on Task nodes. Task nodes are optional, and the number of Task nodes can be zero. Analysis Task nodes are created in an analysis cluster. Streaming Task nodes are created in a streaming cluster. Both analysis and streaming Task nodes are created in a hybrid cluster.

      When the data volume change is small in a cluster but the cluster's service processing capabilities need to be remarkably and temporarily improved, add Task nodes to handle the following situations:

      • Service volumes temporarily increase, for example, report processing at the end of the year.
      • Long-term tasks must be completed in a short time, for example, some urgent analysis tasks.

    Disk LVM

    This parameter is valid in the operation column of a streaming Core node only when the streaming Core node is created. Click this parameter to enable or disable the disk LVM function. The function status is displayed in the parentheses next to this parameter.

    If LVM is enabled, all disks on a node are mounted as logical volumes. This delivers more proper disk planning to avoid data skew, thereby improving system stability.

    (Optional) Add Pay-per-use Task Node

    Click Add Pay-per-use Task Node to configure the information about the Task node.

    Click Auto Scaling in the Operation column of the Task node. On the Auto Scaling page that is displayed, configure an auto scaling policy. For details, see Configuring Auto Scaling Rules When Creating a Cluster.

    • The Auto Scaling parameter in the Operation column of the Task node is used to configure an auto scaling policy. The content in the parentheses next to this parameter indicates the default node range when the auto scaling is enabled or is Disabled when auto scaling is disabled.
    • The price calculator only calculates the price of basic configurations. When Instance Count is set to 0 for Task nodes, the price calculator does not calculate the fee of the Task nodes regardless of whether the number of nodes for auto scaling is configured. The Task nodes added by using the auto scaling function are billed based on the actual usage duration.

    Instance Specifications

    Instance specifications of Master or Core nodes. MRS supports host specifications determined by CPU, memory, and disk space. For details about instance specifications, see ECS Specifications Used by MRS.

    • More advanced instance specifications provide better data processing. However, they require higher cluster cost.
    • If you select HDDs for Core nodes, there is no billing information for data disks. The fees are billed with ECSs.
    • If you select HDDs for Core nodes, the system disks (40 GB) of Master nodes and Core nodes, as well as the data disks (200 GB) of Master nodes, are SATA disks.
    • If you select non-HDD disks for Core nodes, the disk types of Master and Core nodes are determined by Data Disk.
    • If Sold out appears next to an instance specification of a node, the node of this specification cannot be purchased. You can only purchase nodes of other specifications.
    • The Master node specification (4 vCPUs and 8 GB memory) is not within the SLA after-sales scope. It is applicable only to the test environment and is not recommended for the production environment.

    Instance Count

    Number of Master and Core nodes.

    For Master nodes:

    • If Cluster HA is enabled, the number of Master nodes is fixed to 2.
    • If Cluster HA is disabled, the number of Master nodes is fixed to 1.

    At least one Core node must exist and the total number of Core and Task nodes cannot exceed 500.

    • A maximum of 500 Core nodes are supported by default. If more than 500 Core nodes are required, contact HUAWEI CLOUD technical support engineers or invoke a background interface to modify the database.
    • A small number of nodes may cause clusters to run slowly while a large number of nodes may be unnecessarily costly. Set an appropriate value based on data to be processed.

    Data Disk

    Data disk storage space of the Core node. To increase data storage capacity, you can add disks at the same time when creating a cluster. The following two application scenarios are involved.

    • Data storage and computing are separated. Data is stored in OBS, which features low cost and unlimited storage capacity. The clusters can be terminated at any time in OBS. The computing performance is determined by OBS access performance and is lower than that of HDFS. This configuration is recommended if data computing is infrequent.
    • Data storage and computing are not separated. Data is stored in HDFS, which features high cost, high computing performance, and limited storage capacity. Before terminating clusters, you must export and store the data. This configuration is recommended if data computing is frequent.

    Currently, SATA, SAS, and SSD storage types are supported.

    • SATA: Common I/O
    • SAS: High I/O
    • SSD: Ultra-high I/O

    Value range: 100 GB to 32,000 GB

    • More nodes in a cluster require higher disk capacity of Master nodes. To ensure stable cluster running, set the disk capacity of the Master node to over 600 GB if the number of nodes is 300 and increase it to over 1 TB if the number of nodes reaches 500.
    • The Master node increases data disk storage space for MRS Manager. The disk type must be the same as the data disk type of Core nodes. The default disk space is 200 GB and cannot be changed.

    Data Disk Encryption

    Whether to encrypt data in the data disk mounted to the cluster. This function is disabled by default. To use this function, you must have the Security Administrator and KMS Administrator permissions.

    Keys used by encrypted data disks are provided by the Key Management Service (KMS) of the Data Encryption Workshop (DEW) that is secure and convenient. Therefore, you do not need to establish and maintain the key management infrastructure.

    Click or to disable or enable data disk encryption. For details, see EVS Disk Encryption.

    Data Disk Key Name

    This parameter is mandatory when the Data Disk Encryption function is enabled. Select the name of the key used to encrypt the data disk. By default, the default master key named evs/default is selected. You can select another master key from the drop-down list.

    If disks are encrypted using a CMK, which is then disabled or scheduled for deletion, the disks can no longer be read from or written to, and data on these disks may never be restored. Exercise caution when performing this operation.

    Click View Key List to enter a page where you can create and manage keys.

    Data Disk Key ID

    This parameter is displayed only when the Data Disk Encryption function is enabled. This parameter indicates the key ID corresponding to the selected key name.

    Table 3 Login information



    Login Mode

    • Password

      You can log in to ECS nodes using a password.

      A password must meet the following requirements:

      1. Must be a string and 8 to 26 characters long.
      2. Must contain at least 3 of the following character types: uppercase letters, lowercase letters, digits, and special characters (!@$%^-_=+[{}]:\,./?), but must not contain spaces.
      3. Cannot be the username or the username spelled backwards.
    • Key Pair

      Key pairs are used to log in to ECS nodes of the cluster. Select a key pair form the drop-down list. Select "I acknowledge that I have obtained private key file SSHkey-xxx and that without this file I will not be able to log in to my ECS." If you have never created a key pair, click View Key Pair to create or import a key pair. And then, obtain a private key file.

      A key pair, also called an SSH key, consists of a public key and a private key. You can create an SSH key and download the private key for authenticating remote login. For security, a private key can only be downloaded once. Keep it secure.

      Use an SSH key in either of the following two methods:

      1. Creating an SSH key: After you create an SSH key, a public key and a private key are generated. The public key is stored in the system, and the private key is stored in the local ECS. When you log in to an ECS, the public and private keys are used for authentication.
      2. Importing an SSH key: If you have obtained the public and private keys, import the public key into the system. When you log in to an ECS, the public and private keys are used for authentication.
    Table 4 Required duration configuration



    Required Duration

    Cluster required duration when the billing mode is Yearly/Monthly. The minimum cluster duration is 1 month and the maximum available cluster duration is 1 year.

    Table 5 Indicator sharing parameters



    Metric Sharing

    Monitoring metrics of big data components are collected. If a fault occurs when you use a cluster, share the monitoring metrics with HUAWEI CLOUD technical support personnel for troubleshooting.

    Table 6 Advanced settings




    After you click Configure, the page for adding a tag or a bootstrap action is displayed.


    You can set parameters later.

  4. Click Buy Now.

    If you have any question about the price, click Pricing details.

  5. After confirming cluster details, click Submit Order for a yearly/monthly subscribed cluster or Submit Application for a pay-per-use cluster.
  6. Click Back to Cluster List to view the cluster status.

    For details about cluster status during creation, see the description of the status parameters in Table 1.

    It takes some time to create a cluster. The initial status of the cluster is Starting. After the cluster has been created successfully, the cluster status becomes Running.

    On the MRS management console, a maximum of 10 clusters can be concurrently created, and a maximum of 100 clusters can be managed.

    When creating a cluster, you can also create a cluster with the same name in Failed or Terminated state.

Failed to Create a Cluster

If a cluster fails to be created, the failed task will be managed on the Manage Failed Tasks page. Click shown in Figure 1 to go to the Manage Failed Tasks page. In the Task Status column, hover the cursor over to view the failure cause, as shown in Figure 2. You can delete failed tasks by referring to Deleting a Failed Task.

Figure 1 Failed task management
Figure 2 Failure cause

Table 7 lists the error codes of MRS cluster creation failures.

Table 7 Error codes

Error Code



Insufficient quota to meet your request. Contact customer service to increase the quota.


The token cannot be null or invalid. Try again later or contact customer service.


Invalid request. Try again later or contact customer service.


Insufficient resources. Try again later or contact customer service.


Insufficient IP addresses in the existing subnet. Try again later or contact customer service.


Failed due to an ECS error. Try again later or contact customer service.


Failed due to an IAM error. Try again later or contact customer service.


Failed due to a VPC error. Try again later or contact customer service.


MRS system error. Try again later or contact customer service.

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?

Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel