El contenido no se encuentra disponible en el idioma seleccionado. Estamos trabajando continuamente para agregar más idiomas. Gracias por su apoyo.

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
Help Center/ MapReduce Service/ Best Practices/ MRS Cluster Management/ Submitting Spark Tasks to New Task Nodes

Submitting Spark Tasks to New Task Nodes

Updated on 2024-08-12 GMT+08:00

You can add task nodes to an MRS cluster to increase compute capability. Task nodes are mainly used to process data instead of permanently storing data.

This section describes how to bind a new task node using tenant resources and submit Spark tasks to the new task node. You can get started by reading the following topics:

  1. Adding Task Nodes
  2. Creating a Resource Pool
  3. Creating a Tenant
  4. Configuring Queues
  5. Configuring Resource Distribution Policies
  6. Creating a User
  7. Using spark-submit to Submit a Task
  8. Deleting Task Nodes

Adding Task Nodes

  1. On the cluster details page, click Nodes and click Add Node Group. The Add Node Group page is displayed.
  2. On the Add Node Group page that is displayed, set parameters as needed.
    Table 1 Parameters for adding a node group

    Parameter

    Description

    Instance Specifications

    Select the flavor type of the hosts in the node group.

    Nodes

    Configure the number of nodes in the node group.

    System Disk

    Configure the specifications and capacity of the system disks on the new nodes.

    Data Disk (GB)/Disks

    Set the specifications, capacity, and number of data disks of the new nodes.

    Deploy Roles

    Select NM to add a NodeManager role.

  3. Click OK.

Creating a Resource Pool

  1. On the cluster details page, click Tenants.
  2. Click Resource Pools.
  3. Click Create Resource Pool.
  4. On the Create Resource Pool page, set the properties of the resource pool.

    • Name: Enter the name of the resource pool, for example, test1.
    • Resource Label: Enter the resource pool label, for example, 1.
    • Available Hosts: Enter the node added in Adding Task Nodes.

  5. Click OK.

Creating a Tenant

  1. On the cluster details page, click Tenants.
  2. Click Create Tenant. On the page that is displayed, configure tenant properties. The following table takes MRS 3.x versions as an example.

    Table 2 Tenant parameters

    Parameter

    Description

    Name

    Set the tenant name, for example, tenant_spark.

    Tenant Type

    Select Leaf. If Leaf is selected, the current tenant is a leaf tenant and no sub-tenant can be added. If Non-leaf is selected, sub-tenants can be added to the current tenant.

    Compute Resource

    If Yarn is selected, the system automatically creates a task queue using the tenant name in Yarn. If Yarn is not selected, the system does not automatically create a task queue.

    Configuration Mode

    If Yarn is selected for Compute Resource, this parameter can be set to Basic or Advanced.

    • Basic: Configure the percentage of compute resources used by the tenant in the default resource pool by specifying Default Resource Pool Capacity (%).
    • Advanced: Configure the following parameters for advanced settings:
      • Weight: Tenant resource weight. The value ranges from 0 to 100. Tenant resource weight = Tenant weight/Total weight of tenants at the same level
      • Minimum Resources: resources preempted by the tenant. The value is a percentage or absolute value of the parent tenant's resources. When a tenant's workload is light, their resources are automatically lent to other tenants. When available resources are fewer than Minimum Resources, the tenant can preempt the resources that were lent out.
      • Maximum Resources: maximum resources that can be used by a tenant. The value is a percentage or absolute value of the parent tenant's resources.
      • Reserved Resources: resources reserved for the tenant. The value is a percentage or absolute value of the parent tenant's resources.

    Default Resource Pool Capacity (%)

    Set the percentage of computing resources used by the current tenant in the default resource pool, for example, 20%.

    Storage Resource

    If HDFS is selected, the system automatically creates the /tenant directory under the root directory of the HDFS when a tenant is created for the first time. If HDFS is not selected, the system does not create a storage directory under the root directory of the HDFS.

    Maximum Number of Files/Directories

    Set the maximum number of files or directories, for example, 100000000000.

    Storage Space Quota

    Quota for the HDFS storage space used by the current tenant The minimum value is 1, and the maximum value is the total storage quota of the parent tenant. The unit is MB or GB. Set the quota for using the storage space, for example, 50000 MB. This parameter indicates the maximum HDFS storage space that can be used by a tenant, but not the actual space used. If its value is greater than the size of the HDFS physical disk, the maximum space available is the full space of the HDFS physical disk.

    NOTE:

    To ensure data reliability, the system automatically generates one backup file when a file is stored in the HDFS. That is, two replicas of the same file are stored by default. The HDFS storage space indicates the total disk space occupied by all these replicas. For example, if the value is set to 500 MB, the actual space for storing files is about 250 MB (500/2 = 250).

    Storage Path

    Set the storage path, for example, tenant/spark_test. The system automatically creates a folder named after the tenant under the /tenant directory by default, for example, spark_test. The default HDFS storage directory for tenant spark_test is tenant/spark_test. When a tenant is created for the first time, the system creates the /tenant directory in the HDFS root directory. The storage path is customizable.

    Services

    Set other service resources associated with the current tenant. HBase is supported. To configure this parameter, click Associate Services. In the displayed dialog box, set Service to HBase. If Association Mode is set to Exclusive, service resources are occupied exclusively. If share is selected, service resources are shared.

    Description

    Enter the description of the current tenant.

  3. Click OK to save the settings.

    It takes a few minutes to save the settings. If the Tenant created successfully is displayed in the upper-right corner, the tenant is added successfully.

    NOTE:
    • Roles, computing resources, and storage resources are automatically created when tenants are created.
    • The new role has permissions on the computing and storage resources. The role and its permissions are controlled by the system automatically and cannot be controlled manually under Manage Role.
    • If you want to use the tenant, create a system user and assign the Manager_tenant role and the role corresponding to the tenant to the user.

Configuring Queues

  1. On the cluster details page, click Tenants.
  2. Click the Queue Configuration tab.
  3. In the tenant queue table, click Modify in the Operation column of the specified tenant queue.

    NOTE:
    • In the tenant list on the left of the Tenant Management page, click the target tenant. In the displayed window, choose Resource. On the displayed page, click to open the queue modification page (for versions earlier than MRS 3.x).
    • A queue can be bound to only one non-default resource pool.

    By default, the resource tag is the one specified in Creating a Resource Pool. Set other parameters based on the site requirements.

  4. Click OK.

Configuring Resource Distribution Policies

  1. On the cluster details page, click Tenants.
  2. Click Resource Distribution Policies and select the resource pool created in Creating a Resource Pool.
  3. Locate the row that contains tenant_spark, and click Modify in the Operation column.

    • Weight: 20
    • Minimum Resource: 20
    • Maximum Resource: 80
    • Reserved Resource: 10

  4. Click OK.

Creating a User

  1. Log in to FusionInsight Manager. For details, see Accessing FusionInsight Manager.
  2. Choose System > Permission > User. On the displayed page, click Create User.

    • Username: spark_test
    • User Type: Human-Machine
    • User Group: hadoop and hive
    • Primary Group: hadoop
    • Role: tenant_spark

  3. Click OK to add the user.

Using spark-submit to Submit a Task

  1. Log in to the client node as user root and run the following commands:

    cd Client installation directory

    source bigdata_env

    source Spark2x/component_env

    For a cluster with Kerberos authentication enabled, run the kinit spark_test command. For a cluster with Kerberos authentication disabled, skip this step.

    Enter the password for authentication. Change the password upon the first login.

    cd Spark2x/spark/bin

    sh spark-submit --queue tenant_spark --class org.apache.spark.examples.SparkPi --master yarn-client ../examples/jars/spark-examples_*.jar

Deleting Task Nodes

  1. On the cluster details page, click Nodes.
  2. Locate the row that contains the target task node group, and click Scale In in the Operation column.
  3. Set the Scale-In Type to Specific node and select the target nodes.
    NOTE:

    Only nodes in the stopped, lost, unknown, isolated, or faulty state can be selected for scale-in.

  4. Select I understand the consequences of performing the scale-in operation, and click OK.

Utilizamos cookies para mejorar nuestro sitio y tu experiencia. Al continuar navegando en nuestro sitio, tú aceptas nuestra política de cookies. Descubre más

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback