Help Center/ MapReduce Service/ Best Practices/ MRS Cluster Management/ Submitting Spark Tasks to New Task Nodes
Updated on 2024-08-12 GMT+08:00

Submitting Spark Tasks to New Task Nodes

You can add task nodes to an MRS cluster to increase compute capability. Task nodes are mainly used to process data instead of permanently storing data.

This section describes how to bind a new task node using tenant resources and submit Spark tasks to the new task node. You can get started by reading the following topics:

  1. Adding Task Nodes
  2. Creating a Resource Pool
  3. Creating a Tenant
  4. Configuring Queues
  5. Configuring Resource Distribution Policies
  6. Creating a User
  7. Using spark-submit to Submit a Task
  8. Deleting Task Nodes

Adding Task Nodes

  1. On the cluster details page, click Nodes and click Add Node Group. The Add Node Group page is displayed.
  2. On the Add Node Group page that is displayed, set parameters as needed.
    Table 1 Parameters for adding a node group

    Parameter

    Description

    Instance Specifications

    Select the flavor type of the hosts in the node group.

    Nodes

    Configure the number of nodes in the node group.

    System Disk

    Configure the specifications and capacity of the system disks on the new nodes.

    Data Disk (GB)/Disks

    Set the specifications, capacity, and number of data disks of the new nodes.

    Deploy Roles

    Select NM to add a NodeManager role.

  3. Click OK.

Creating a Resource Pool

  1. On the cluster details page, click Tenants.
  2. Click Resource Pools.
  3. Click Create Resource Pool.
  4. On the Create Resource Pool page, set the properties of the resource pool.

    • Name: Enter the name of the resource pool, for example, test1.
    • Resource Label: Enter the resource pool label, for example, 1.
    • Available Hosts: Enter the node added in Adding Task Nodes.

  5. Click OK.

Creating a Tenant

  1. On the cluster details page, click Tenants.
  2. Click Create Tenant. On the page that is displayed, configure tenant properties. The following table takes MRS 3.x versions as an example.

    Table 2 Tenant parameters

    Parameter

    Description

    Name

    Set the tenant name, for example, tenant_spark.

    Tenant Type

    Select Leaf. If Leaf is selected, the current tenant is a leaf tenant and no sub-tenant can be added. If Non-leaf is selected, sub-tenants can be added to the current tenant.

    Compute Resource

    If Yarn is selected, the system automatically creates a task queue using the tenant name in Yarn. If Yarn is not selected, the system does not automatically create a task queue.

    Configuration Mode

    If Yarn is selected for Compute Resource, this parameter can be set to Basic or Advanced.

    • Basic: Configure the percentage of compute resources used by the tenant in the default resource pool by specifying Default Resource Pool Capacity (%).
    • Advanced: Configure the following parameters for advanced settings:
      • Weight: Tenant resource weight. The value ranges from 0 to 100. Tenant resource weight = Tenant weight/Total weight of tenants at the same level
      • Minimum Resources: resources preempted by the tenant. The value is a percentage or absolute value of the parent tenant's resources. When a tenant's workload is light, their resources are automatically lent to other tenants. When available resources are fewer than Minimum Resources, the tenant can preempt the resources that were lent out.
      • Maximum Resources: maximum resources that can be used by a tenant. The value is a percentage or absolute value of the parent tenant's resources.
      • Reserved Resources: resources reserved for the tenant. The value is a percentage or absolute value of the parent tenant's resources.

    Default Resource Pool Capacity (%)

    Set the percentage of computing resources used by the current tenant in the default resource pool, for example, 20%.

    Storage Resource

    If HDFS is selected, the system automatically creates the /tenant directory under the root directory of the HDFS when a tenant is created for the first time. If HDFS is not selected, the system does not create a storage directory under the root directory of the HDFS.

    Maximum Number of Files/Directories

    Set the maximum number of files or directories, for example, 100000000000.

    Storage Space Quota

    Quota for the HDFS storage space used by the current tenant The minimum value is 1, and the maximum value is the total storage quota of the parent tenant. The unit is MB or GB. Set the quota for using the storage space, for example, 50000 MB. This parameter indicates the maximum HDFS storage space that can be used by a tenant, but not the actual space used. If its value is greater than the size of the HDFS physical disk, the maximum space available is the full space of the HDFS physical disk.

    NOTE:

    To ensure data reliability, the system automatically generates one backup file when a file is stored in the HDFS. That is, two replicas of the same file are stored by default. The HDFS storage space indicates the total disk space occupied by all these replicas. For example, if the value is set to 500 MB, the actual space for storing files is about 250 MB (500/2 = 250).

    Storage Path

    Set the storage path, for example, tenant/spark_test. The system automatically creates a folder named after the tenant under the /tenant directory by default, for example, spark_test. The default HDFS storage directory for tenant spark_test is tenant/spark_test. When a tenant is created for the first time, the system creates the /tenant directory in the HDFS root directory. The storage path is customizable.

    Services

    Set other service resources associated with the current tenant. HBase is supported. To configure this parameter, click Associate Services. In the displayed dialog box, set Service to HBase. If Association Mode is set to Exclusive, service resources are occupied exclusively. If share is selected, service resources are shared.

    Description

    Enter the description of the current tenant.

  3. Click OK to save the settings.

    It takes a few minutes to save the settings. If the Tenant created successfully is displayed in the upper-right corner, the tenant is added successfully.

    • Roles, computing resources, and storage resources are automatically created when tenants are created.
    • The new role has permissions on the computing and storage resources. The role and its permissions are controlled by the system automatically and cannot be controlled manually under Manage Role.
    • If you want to use the tenant, create a system user and assign the Manager_tenant role and the role corresponding to the tenant to the user.

Configuring Queues

  1. On the cluster details page, click Tenants.
  2. Click the Queue Configuration tab.
  3. In the tenant queue table, click Modify in the Operation column of the specified tenant queue.

    • In the tenant list on the left of the Tenant Management page, click the target tenant. In the displayed window, choose Resource. On the displayed page, click to open the queue modification page (for versions earlier than MRS 3.x).
    • A queue can be bound to only one non-default resource pool.

    By default, the resource tag is the one specified in Creating a Resource Pool. Set other parameters based on the site requirements.

  4. Click OK.

Configuring Resource Distribution Policies

  1. On the cluster details page, click Tenants.
  2. Click Resource Distribution Policies and select the resource pool created in Creating a Resource Pool.
  3. Locate the row that contains tenant_spark, and click Modify in the Operation column.

    • Weight: 20
    • Minimum Resource: 20
    • Maximum Resource: 80
    • Reserved Resource: 10

  4. Click OK.

Creating a User

  1. Log in to FusionInsight Manager. For details, see Accessing FusionInsight Manager.
  2. Choose System > Permission > User. On the displayed page, click Create User.

    • Username: spark_test
    • User Type: Human-Machine
    • User Group: hadoop and hive
    • Primary Group: hadoop
    • Role: tenant_spark

  3. Click OK to add the user.

Using spark-submit to Submit a Task

  1. Log in to the client node as user root and run the following commands:

    cd Client installation directory

    source bigdata_env

    source Spark2x/component_env

    For a cluster with Kerberos authentication enabled, run the kinit spark_test command. For a cluster with Kerberos authentication disabled, skip this step.

    Enter the password for authentication. Change the password upon the first login.

    cd Spark2x/spark/bin

    sh spark-submit --queue tenant_spark --class org.apache.spark.examples.SparkPi --master yarn-client ../examples/jars/spark-examples_*.jar

Deleting Task Nodes

  1. On the cluster details page, click Nodes.
  2. Locate the row that contains the target task node group, and click Scale In in the Operation column.
  3. Set the Scale-In Type to Specific node and select the target nodes.

    Only nodes in the stopped, lost, unknown, isolated, or faulty state can be selected for scale-in.

  4. Select I understand the consequences of performing the scale-in operation, and click OK.