Help Center/ MapReduce Service/ Getting Started/ Submitting Spark Tasks to New Task Nodes
Updated on 2022-09-14 GMT+08:00

Submitting Spark Tasks to New Task Nodes

Add task nodes to a custom MRS cluster to increase compute capability. Task nodes are mainly used to process data instead of permanently storing data.

Currently, task nodes can only be added to custom MRS clusters.

This section describes how to bind a new task node using tenant resources and submit Spark tasks to the new task node. You can get started by reading the following topics:

  1. Adding Task Nodes
  2. Creating a Resource Pool
  3. Creating a Tenant
  4. Configuring Queues
  5. Configuring Resource Distribution Policies
  6. Creating a User
  7. Using spark-submit to Submit a Task
  8. Deleting Task Nodes

Adding Task Nodes

  1. On the details page of a custom MRS cluster, click the Nodes tab. On this tab page, click Add Node Group.
  2. On the Add Node Group page that is displayed, set parameters as needed.
    Table 1 Parameters for adding a node group

    Parameter

    Description

    Instance Specifications

    Select the flavor type of the hosts in the node group.

    Nodes

    Configure the number of nodes in the node group.

    System Disk

    Configure the specifications and capacity of the system disks on the new nodes.

    Data Disk (GB)/Disks

    Set the specifications, capacity, and number of data disks of the new nodes.

    Deploy Roles

    Select NM to add a NodeManager role.

  3. Click OK.

Creating a Resource Pool

  1. On the cluster details page, click Tenants.
  2. Click Resource Pools.
  3. Click Create Resource Pool.
  4. On the Create Resource Pool page, set the properties of the resource pool.

    • Name: Enter the name of the resource pool, for example, test1.
    • Resource Label: Enter the resource pool label, for example, 1.
    • Available Hosts: Enter the node added in Adding Task Nodes.

  5. Click OK.

Creating a Tenant

  1. On the cluster details page, click Tenants.
  2. Click Create Tenant. On the displayed page, configure tenant properties.

    Table 2 Tenant parameters

    Parameter

    Description

    Name

    Set the tenant name, for example, tenant_spark.

    Tenant Type

    Select Leaf. If Leaf is selected, the current tenant is a leaf tenant and no sub-tenant can be added. If Non-leaf is selected, sub-tenants can be added to the current tenant.

    Dynamic Resource

    If Yarn is selected, the system automatically creates a task queue using the tenant name in Yarn. If Yarn is not selected, the system does not automatically create a task queue.

    Default Resource Pool Capacity (%)

    Set the percentage of computing resources used by the current tenant in the default resource pool, for example, 20%.

    Default Resource Pool Max. Capacity (%)

    Set the maximum percentage of computing resources used by the current tenant in the default resource pool, for example, 80%.

    Storage Resource

    If HDFS is selected, the system automatically creates the /tenant directory under the root directory of the HDFS when a tenant is created for the first time. If HDFS is not selected, the system does not create a storage directory under the root directory of the HDFS.

    Maximum Number of Files/Directories

    Set the maximum number of files or directories, for example, 100000000000.

    Storage Space Quota (MB)

    Set the quota for using the storage space, for example, 50000 MB. This parameter indicates the maximum HDFS storage space that can be used by a tenant, but not the actual space used. If its value is greater than the size of the HDFS physical disk, the maximum space available is the full space of the HDFS physical disk.

    NOTE:

    To ensure data reliability, the system automatically generates one backup file when a file is stored in the HDFS. That is, two replicas of the same file are stored by default. The HDFS storage space indicates the total disk space occupied by all these replicas. For example, if the value of Storage Space Quota is set to 500, the actual space for storing files is about 250 MB (500/2 = 250).

    Storage Path

    Set the storage path, for example, tenant/spark_test. The system automatically creates a folder named after the tenant under the /tenant directory by default, for example, spark_test. The default HDFS storage directory for tenant spark_test is tenant/spark_test. When a tenant is created for the first time, the system creates the /tenant directory in the HDFS root directory. The storage path is customizable.

    Services

    Set other service resources associated with the current tenant. HBase is supported. To configure this parameter, click Associate Services. In the displayed dialog box, set Service to HBase. If Association Mode is set to Exclusive, service resources are occupied exclusively. If share is selected, service resources are shared.

    Description

    Enter the description of the current tenant.

  3. Click OK to save the settings.

    It takes a few minutes to save the settings. If the Tenant created successfully is displayed in the upper-right corner, the tenant is added successfully.

    • Roles, computing resources, and storage resources are automatically created when tenants are created.
    • The new role has permissions on the computing and storage resources. The role and its permissions are controlled by the system automatically and cannot be controlled manually under Manage Role.
    • If you want to use the tenant, create a system user and assign the Manager_tenant role and the role corresponding to the tenant to the user.

Configuring Queues

  1. On the cluster details page, click Tenants.
  2. Click the Queue Configuration tab.
  3. In the tenant queue table, click Modify in the Operation column of the specified tenant queue.

    • In the tenant list on the left of the Tenant Management page, click the target tenant. In the displayed window, choose Resource. On the displayed page, click to open the queue modification page.
    • A queue can be bound to only one non-default resource pool.

    By default, the resource tag is the one specified in Creating a Resource Pool. Set other parameters based on the site requirements.

  4. Click OK.

Configuring Resource Distribution Policies

  1. On the cluster details page, click Tenants.
  2. Click Resource Distribution Policies and select the resource pool created in Creating a Resource Pool.
  3. Locate the row that contains tenant_spark, and click Modify in the Operation column.

    • Weight: 20
    • Minimum Resource: 20
    • Maximum Resource: 80
    • Reserved Resource: 10

  4. Click OK.

Creating a User

  1. Log in to FusionInsight Manager. For details, see Accessing FusionInsight Manager.
  2. Choose System > Permission > User. On the displayed page, click Create User.

    • Username: spark_test
    • User Type: Human-Machine
    • User Group: hadoop and hive
    • Primary Group: hadoop
    • Role: tenant_spark

  3. Click OK to add the user.

Using spark-submit to Submit a Task

  1. Log in to the client node as user root and run the following commands:

    cd Client installation directory

    source bigdata_env

    source Spark2x/component_env

    For a cluster with Kerberos authentication enabled, run the kinit spark_test command. For a cluster with Kerberos authentication disabled, skip this step.

    Enter the password for authentication. Change the password upon the first login.

    cd Spark2x/spark/bin

    sh spark-submit --queue tenant_spark --class org.apache.spark.examples.SparkPi --master yarn-client ../examples/jars/spark-examples_*.jar

Deleting Task Nodes

  1. On the cluster details page, click Nodes.
  2. Locate the row that contains the target task node group, and click Scale In in the Operation column.
  3. Set the Scale-In Type to Specific node and select the target nodes.

    The target nodes need to be shut down.

  4. Select I understand the consequences of performing the scale-in operation, and click OK.