Updated on 2024-11-29 GMT+08:00

Creating a Custom Cluster

To use MRS, create a cluster on the MRS management console.

You can create an IAM user or user group on the IAM management console and grant it specific operation permissions, to perform refined resource management after registering an account. For details, see Creating an MRS User.

  1. Click the Custom Config tab.

    When creating a cluster, pay attention to quota notification. If a resource quota is insufficient, increase the resource quota as prompted and create a cluster.

  2. Configure cluster information by referring to Software Configurations and click Next.

    Only one billing mode is supported in some regions. For details, see the management console.

  3. Configure cluster information by referring to Hardware Configurations and click Next.
  4. Set advanced options by referring to Advanced Options. Then, click Next.

    If Kerberos authentication is enabled, check whether this function is required. If it is, click Continue. If not, click Back to disable it and then proceed with the subsequent steps. This option cannot be changed after you create a cluster.

  5. On the Confirm Configuration page, check the cluster configuration information. If you need to adjust the configuration, click to go to the corresponding tab page and configure parameters again.
  6. Select the checkbox to enable secure communications. For details, see Communication Security Authorization.
  7. Click Back to Cluster List to view the cluster status.

    For details about cluster status during creation, see the description of the status parameters in Table 1.

    It takes some time to create a cluster. The initial status of the cluster is Starting. After the cluster has been created successfully, the cluster status becomes Running.

    On the MRS management console, a maximum of 10 clusters can be concurrently created, and a maximum of 100 clusters can be managed.

Software Configurations

Table 1 MRS cluster software configuration

Parameter

Description

Region

Select a region.

Cloud service products in different regions cannot communicate with each other over an intranet. For low network latency and quick access, select the nearest region.

Cluster Name

The cluster name must be unique.

A cluster name can contain 1 to 64 characters. Only letters, digits, hyphens (-), and underscores (_) are allowed.

The default name is mrs_xxxx. xxxx is a random collection of letters and digits.

Cluster Type

The cluster types are as follows:
  • Analysis cluster: is used for offline data analysis and provides Hadoop components.
  • Streaming cluster: is used for streaming tasks and provides stream processing components.
  • Hybrid cluster: is used for both offline data analysis and streaming processing and provides Hadoop components and streaming processing components. You are advised to use a hybrid cluster to perform offline data analysis and streaming processing tasks at the same time.
  • Custom: You can adjust the cluster service deployment mode based on service requirements. For details, see Configuring Custom Topology.
NOTE:
  • MRS streaming clusters do not support job and file management functions.
  • To install all components in a cluster, select Custom.

Version Type

The following version types are available:

  • Normal:
    • Supports basic cluster operations, such as configuration, management, and O&M.
    • Supports components such as Presto, Impala, Kudu, and Sqoop.
  • LTS:
    • In addition to basic cluster operations, the LTS version supports version upgrade.
    • Supports multi-AZ deployment.
    • Supports HetuEngine, IoTDB, and CDL.

The default version type is Normal.

Cluster Version

Currently, MRS 3.3.1-LTS is supported.

Component

MRS cluster components. For details about component versions supported by different versions of MRS clusters, see .

Metadata

Whether to use external data sources to store metadata.

  • Local: Metadata is stored in the local cluster.
  • External data connection: Metadata of external data sources is used. If the cluster is abnormal or deleted, metadata is not affected. This mode applies to scenarios where storage and compute are decoupled.

Clusters that support the Hive or Ranger component support this function.

Component

This parameter is available only when Metadata is set to External data connection. It indicates the type of an external data source.

  • Hive
  • Ranger

Data Connection Type

This parameter is available only when Metadata is set to External data connection. It indicates the type of an external data source. When you create a cluster, Data Connection Type can only be set to Local database.

Component port (supported only for the LTS version)

Policy of the default communication port of each component in the MRS cluster.

  • Open source: Use the port provided by the open source component.
  • Custom: Customize a port for the component.

For details about the differences between default open source port and default custom port, see Web UIs of Open Source Components.

Hardware Configurations

Table 2 MRS cluster hardware configuration

Parameter

Description

AZ

Select the AZ associated with the region of the cluster.

An AZ is a physical area that uses independent power and network resources. AZs are physically isolated but interconnected through the internal network. This improves the availability of applications. You are advised to create clusters in different AZs.

Enterprise Project

Select the enterprise project to which the cluster belongs. To use an enterprise project, create one on the Enterprise > Project Management page.

The Enterprise Management console of the enterprise project is designed for resource management. It helps enterprises manage cloud-based personnel, resources, permissions, and finance in a hierarchical manner, such as management of companies, departments, and projects.

VPC

A VPC is a secure, isolated, and logical network environment.

Select the VPC for which you want to create a cluster and click View VPC to view the name and ID of the VPC. If no VPC is available, create one.

Subnet

A subnet provides dedicated network resources that are isolated from other networks, improving network security.

Select the subnet for which you want to create a cluster. Click View Subnet to view details about the selected subnet. If no subnet is created in the VPC, go to the VPC console and choose Subnets > Create Subnet to create one. For details about how to configure network ACL outbound rules, see How Do I Configure a Network ACL Outbound Rule?

NOTE:

The number of IP addresses required by creating an MRS cluster depends on the number of cluster nodes and selected components, but not the cluster type.

In MRS, IP addresses are automatically assigned to clusters during cluster creation basically based on the following formula: Quantity of IP addresses = Number of cluster nodes + 2 (one for Manager; one for the DB). In addition, if the Hadoop, Hue, Sqoop, and Presto or Solr, GraphBase, Loader and Presto components are selected during cluster deployment, one IP address is added for each component. To create a ClickHouse cluster independently, the number of IP addresses required is calculated as follows: Number of IP addresses = Number of cluster nodes + 1 (for Manager).

Security Group

A security group is a set of ECS access rules. It provides access policies for ECSs that have the same security protection requirements and are mutually trusted in a VPC.

When you create a cluster, you can select Auto create from the drop-down list of Security Group to create a security group or select an existing security group.

NOTE:

When you select a security group created by yourself, ensure that the inbound rule contains a rule in which Protocol is set to All, Port is set to All, and Source is set to a trusted accessible IP address range. Do not use 0.0.0.0/0 as a source address. Otherwise, security risks may occur. If you do not know the trusted accessible IP address range, select Auto create.

EIP

After binding an EIP to an MRS cluster, you can use the EIP to access the Manager web UI of the cluster.

When creating a cluster, you can select an available EIP from the drop-down list and bind it. If no EIP is available in the drop-down list, click Manage EIP to access the EIPs service page to create one.

NOTE:

The EIP must be in the same region as the cluster.

Table 3 Cluster node information

Parameter

Description

CPU Architecture

CPU architecture supported by MRS.

  • x86: The x86-based CPU architecture uses Complex Instruction Set Computing (CISC). Each instruction can be used to execute low-level hardware operations. The number of instructions is large, and the length of each instruction is different. Therefore, executing such an instruction is complex and time-consuming.
  • Kunpeng: The Kunpeng-based CPU architecture uses Reduced Instruction Set Computing (RISC). RISC is a microprocessor that executes fewer types of computer instructions but at a higher speed than CISC. RISC simplifies the computer architecture and improves the running speed. Compared with the x86-based CPU architecture, the Kunpeng-based CPU architecture has a more balanced performance and power consumption ratio. Kunpeng features high density, low power consumption, high cost-effectiveness.

Common Node Configurations

This parameter is available only when Cluster Type is set to Custom. Value options include Compact, Full-size, and OMS-separate. For details, see Custom Cluster Template Description.

Node Group

Name of a node group

An MRS cluster consists of multiple ECS nodes. The system manages the nodes based on node groups.

Nodes in a cluster are classified into the following types based on the roles of components deployed on the nodes:

  • Master: manages the cluster and allocates cluster executable files to core nodes. traces the execution status of each job, and monitors the DataNode running status.
  • Core: cluster worker node, which processes and analyzes data and stores process data.

    The system automatically creates a core node group based on the components contained in the cluster. For example, if you select the ClickHouse component, the system adds the ClickHouse node group and deploys the ClickHouseServer role in the node group by default.

  • Task: provides compute resources, on which Yarn and Storm are installed. Task nodes do not store persistent data. When compute resources in a cluster are insufficient, you can configure auto scaling policies to automatically increase task nodes.

    When the data volume change is small in a cluster but the cluster's service processing capabilities need to be remarkably and temporarily improved, add Task nodes to address the following situations:

    For clusters whose Cluster Type is Analysis cluster, Streaming cluster , and Hybrid cluster, the system automatically adds the corresponding task node groups. You can delete the task node groups if they are not required.

Node Type

Type of the nodes in the group. Options include Core and Task.
NOTE:

If the node group type is set to Task, only the NodeManager role (except mandatory roles) can be deployed in the node group.

Node Count

Configure node quantity in each node group.

  • Master Node Groups: The number of Master instances ranges from 3 to 9.
  • At least one Core node must exist and the total number of Core and Task nodes cannot exceed 10,000.

    Click to add a node group, click to modify the node instance specifications, and click to delete the added node group.

NOTE:

A small number of nodes may cause clusters to run slowly while a large number of nodes may be unnecessarily costly. Set an appropriate value based on data to be processed.

Instance Specifications

Instance specifications of Master or Core nodes. MRS supports host specifications determined by CPU, memory, and disk space. Click to configure the instance specifications, system disk, and data disk parameters of the cluster node.

NOTE:
  • More advanced instance specifications provide better data processing.
  • Instance specifications may vary in different AZs. If no instance specifications in the current AZ can meet your requirements, switch to another AZ.
  • If you select non-HDD disks for Core nodes, the disk types of Master and Core nodes are determined by Data Disk.
  • If Sold out appears next to an instance specification of a node, the node of this specification cannot be created. You can only create nodes of other specifications.
  • The memory of the master node must be greater than 64 GB.

System Disk

Storage type and storage space of the system disk on a node.

Storage type can be any of the following:
  • SAS: high I/O
  • SSD: ultra-high I/O
  • GPSSD: general-purpose SSD

Data Disk

Data disk storage space of a node. For more data storage, you can add disks when creating a cluster. A maximum of 10 disks can be added to each Core or Task node.

  • Data storage and computing are separated. Data is stored in OBS, which features low cost and unlimited storage capacity. The clusters can be deleted at any time in OBS. The computing performance is determined by OBS access performance and is lower than that of HDFS. This configuration is recommended if data computing is infrequent.
  • Data storage and computing are not separated. Data is stored in HDFS, which features high cost, high computing performance, and limited storage capacity. Before deleting clusters, you must export and store the data. This configuration is recommended if data computing is frequent.

The storage type can be any of the following:

  • SAS: high I/O
  • SSD: ultra-high I/O
  • GPSSD: general-purpose SSD
NOTE:

More nodes in a cluster require higher disk capacity of Master nodes. To ensure stable cluster running, set the disk capacity of the Master node to over 600 GB if the number of nodes is 300 and increase it to over 1 TB if the number of nodes reaches 500.

Instance Count

Number of Master and Core nodes.

  • Master Node Groups: The number of Master instances ranges from 3 to 9.
  • At least one Core node must exist and the total number of Core and Task nodes cannot exceed 10,000.

    Click to add a node group, click to modify the node instance specifications, and click to delete the added node group.

NOTE:

A small number of nodes may cause clusters to run slowly while a large number of nodes may be unnecessarily costly. Set an appropriate value based on data to be processed.

Topology Adjustment

If the deployment mode in the Common Node does not meet the requirements, set Topology Adjustment to Enable and adjust the instance deployment mode based on service requirements. For details, see Topology Adjustment for a Custom Cluster. This parameter is valid only when Cluster Type is set to Custom.

Advanced Options

Table 4 MRS cluster advanced configuration topology

Parameter

Description

Kerberos Authentication

Whether to enable Kerberos authentication when logging in to Manager. This option cannot be changed after you create a cluster.

  • : If Kerberos Authentication is disabled, common users can use all functions of an MRS cluster. You are advised to disable Kerberos authentication in single-user scenarios. If Kerberos authentication is disabled, you can follow instructions in Security Configuration Suggestions for Clusters with Kerberos Authentication Disabled to perform security configuration.
  • : If Kerberos Authentication is enabled, common users cannot use the file and job management functions of an MRS cluster and cannot view cluster resource usage or the job records for Hadoop and Spark. To use more cluster functions, the users must contact the Manager administrator to assign more permissions. You are advised to enable Kerberos authentication in multi-user scenarios.
  • Currently, Presto does not support Kerberos authentication.

Username

Name of the administrator of Manager. admin is used by default.

Password

Password of the Manager administrator

The following requirements must be met:

  • Must contain 8 to 26 characters.
  • Must contain at least four of the following:
    • Lowercase letters
    • Uppercase letters
    • Digits
    • At least one of the following special characters: `~!@#$%^&*()-_=+|[{}];:',<.>/?
  • Cannot be the same as the username or the username spelled backwards.

Password Strength: The colorbar in red, orange, and green indicates weak, medium, and strong password, respectively.

Confirm Password

Enter the password of the Manager administrator again.

Login Mode

  • Password

    Log in to the ECS as user root. Enter the password of user root and confirm the password.

    A password must meet the following requirements:

    1. Must be a string and 8 to 26 characters long.
    2. Must contain at least four of the following: uppercase letters, lowercase letters, digits, and special characters (`~!@#$%^&*()-_=+|[{}];:',<.>/?).
    3. The password cannot be the username or the reverse username.
  • Key Pair

    Key pairs are used to log in to ECS nodes of the cluster. Select a key pair from the drop-down list. Select "I acknowledge that I have obtained private key file SSHkey-xxx and that without this file I will not be able to log in to my ECS." If you have never created a key pair, click View Key Pair to create or import a key pair. And then, obtain a private key file.

    A key pair, also called an SSH key, consists of a public key and a private key. You can create an SSH key and download the private key for authenticating remote login. For security, a private key can only be downloaded once. Keep it secure.

    Use an SSH key in either of the following two methods:

    1. Creating an SSH key: After you create an SSH key, a public key and a private key are generated. The public key is stored in the system, and the private key is stored in the local ECS. When you log in to an ECS, the public and private keys are used for authentication.
    2. Importing an SSH key: If you have obtained the public and private keys, import the public key into the system. When you log in to an ECS, the public and private keys are used for authentication.

Hostname Prefix

Enter the prefix for the computer hostname of an ECS in the cluster.

Setting Advanced Options

Advanced function parameters of an MRS cluster. Select Configure. For details, see Table 5.

Table 5 (Optional) Advanced configuration information of the MRS cluster

Parameter

Description

Tag

For details, see Adding a Tag to a Cluster/Node.

Auto Scaling

Auto scaling can be configured only after you specify task node specifications in the Configure Hardware step by referring to Configuring Auto Scaling Rules.

Bootstrap Action

For details, see Adding a Bootstrap Action.

Agency

By binding an agency, ECSs or BMSs can manage some of your resources. Determine whether to configure an agency based on the actual service scenario.

For example, you can configure an agency of the ECS type to automatically obtain the AK/SK to access OBS. For details, see Configuring a Storage-Compute Decoupled Cluster (Agency).

The MRS_ECS_DEFAULT_AGENCY agency has the OBSOperateAccess permission of OBS and the CESFullAccess (for users who have enabled fine-grained policies), CES Administrator, and KMS Administrator permissions in the region where the cluster is located.

Data Disk Encryption

Whether to encrypt data in the data disk mounted to the cluster. This function is disabled by default. To use this function, you must have the Security Administrator and KMS Administrator permissions.

Keys used by encrypted data disks are provided by the Key Management Service (KMS) of the Data Encryption Workshop (DEW), secure and convenient. Therefore, you do not need to establish and maintain the key management infrastructure.

Click Data Disk Encryption to enable or disable the data disk encryption function.

Data Disk Key ID

This parameter is displayed only when the Data Disk Encryption function is enabled. This parameter indicates the key ID corresponding to the selected key name.

Data Disk Key Name

This parameter is mandatory when the Data Disk Encryption function is enabled. Select the name of the key used to encrypt the data disk. By default, the default master key named evs/default is selected. You can select another master key from the drop-down list.

If disks are encrypted using a CMK, which is then disabled or scheduled for deletion, the disks can no longer be read from or written to, and data on these disks may never be restored. Exercise caution when performing this operation.

Click View Key List to enter a page where you can create and manage keys.

Alarm

If the alarm function is enabled, the cluster maintenance personnel can be notified in a timely manner to locate faults when the cluster runs abnormally or the system is faulty.

Rule Name

Name of the rule for sending alarm messages. The value can contain only digits, letters, hyphens (-), and underscores (_).

Topic Name

Select an existing topic or click Create Topic to create a topic. To deliver messages published to a topic, you need to add a subscriber to the topic. For details, see Adding Subscriptions to a Topic.

A topic serves as a message sending channel, where publishers and subscribers can interact with each other.

Logging

Whether to collect logs when cluster creation fails.

After the logging function is enabled, system logs and component run logs are automatically collected and saved to the OBS file system in scenarios such as cluster creation failures and scale-out or scale-in failures for O&M personnel to quickly locate faults. The log information is retained for a maximum of seven days.

Failed to Create a Cluster

If a cluster fails to be created, the failed task will be managed on the Manage Failed Tasks page. Choose Clusters > Active Clusters. Click to go to the Manage Failed Tasks page. In the Task Status column, hover your cursor over to view the failure cause. You can delete failed tasks by referring to Viewing Failed MRS Tasks.

Table 6 lists the error codes of MRS cluster creation failures.

Table 6 Error codes

Error Code

Description

MRS.101

Insufficient quota to meet your request. Contact customer service to increase the quota.

MRS.102

The token cannot be null or invalid. Try again later or contact the administrator.

MRS.103

Invalid request. Try again later or contact the administrator.

MRS.104

Insufficient resources. Try again later or contact the administrator.

MRS.105

Insufficient IP addresses in the existing subnet. Try again later or contact the administrator.

MRS.201

Failed due to an ECS error. Try again later or contact the administrator.

MRS.202

Failed due to an IAM error. Try again later or contact the administrator.

MRS.203

Failed due to a VPC error. Try again later or contact the administrator.

MRS.400

MRS system error. Try again later or contact the administrator.