Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
On this page

Show all

Submitting a DLI Spark Job

Updated on 2023-11-21 GMT+08:00

Run the ma-cli dli-job submit command to submit a DLI Spark job.

Before running this command, configure YAML_FILE to specify the path to the configuration file of the target job. If this parameter is not specified, the configuration file is empty. The configuration file is in YAML format, and its parameters are the option parameter of the command. If you specify both the YAML_FILE configuration file and the option parameter in the CLI, the value of the option parameter will overwrite that in the configuration file.

CLI Parameters

ma-cli dli-job submit -h
Usage: ma-cli dli-job submit [OPTIONS] [YAML_FILE]...

  Submit DLI Spark job.

  Example:

  ma-cli dli-job submit  --name test-spark-from-sdk
                          --file test/sub_dli_task.py
                          --obs-bucket dli-bucket
                          --queue dli_test
                          --spark-version 2.4.5
                          --driver-cores 1
                          --driver-memory 1G
                          --executor-cores 1
                          --executor-memory 1G
                          --num-executors 1

Options:
  --file TEXT                    Python file or app jar.
  -cn, --class-name TEXT         Your application's main class (for Java / Scala apps).
  --name TEXT                    Job name.
  --image TEXT                   Full swr custom image path.
  --queue TEXT                   Execute queue name.
  -obs, --obs-bucket TEXT        DLI obs bucket to save logs.
  -sv, --spark-version TEXT      Spark version.
  -st, --sc-type [A|B|C]         Compute resource type.
  --feature [basic|custom|ai]    Type of the Spark image used by a job (default: basic).
  -ec, --executor-cores INTEGER  Executor cores.
  -em, --executor-memory TEXT    Executor memory (eg. 2G/2048MB).
  -ne, --num-executors INTEGER   Executor number.
  -dc, --driver-cores INTEGER    Driver cores.
  -dm, --driver-memory TEXT      Driver memory (eg. 2G/2048MB).
  --conf TEXT                    Arbitrary Spark configuration property (eg. <PROP=VALUE>).
  --resources TEXT               Resources package path.
  --files TEXT                   Files to be placed in the working directory of each executor.
  --jars TEXT                    Jars to include on the driver and executor class paths.
  -pf, --py-files TEXT           Python files to place on the PYTHONPATH for Python apps.
  --groups TEXT                  User group resources.
  --args TEXT                    Spark batch job parameter args.
  -q, --quiet                    Exit without waiting after submit successfully.
  -C, --config-file PATH         Configure file path for authorization.
  -D, --debug                    Debug Mode. Shows full stack trace when error occurs.
  -P, --profile TEXT             CLI connection profile to use. The default profile is "DEFAULT".
  -H, -h, --help                 Show this message and exit.

YAML File Preview

# dli-demo.yaml
name: test-spark-from-sdk
file: test/sub_dli_task.py
obs-bucket: ${your_bucket}
queue: dli_notebook 
spark-version: 2.4.5
driver-cores: 1
driver-memory: 1G
executor-cores: 1
executor-memory: 1G
num-executors: 1

## [Optional] 
jars:
  - ./test.jar
  - obs://your-bucket/jars/test.jar
  - your_group/test.jar

## [Optional] 
files:
  - ./test.csv
  - obs://your-bucket/files/test.csv
  - your_group/test.csv

## [Optional] 
python-files:
  - ./test.py
  - obs://your-bucket/files/test.py
  - your_group/test.py

## [Optional] 
resources:
  - name: your_group/test.py
    type: pyFile
  - name: your_group/test.csv
    type: file
  - name: your_group/test.jar
    type: jar
  - name: ./test.py
    type: pyFile
  - name: obs://your-bucket/files/test.py
    type: pyFile

## [Optional]
groups:
  - group1
  - group2

Example of submitting a DLI Spark job with options specified:

$ ma-cli dli-job submit --name test-spark-from-sdk \
                        --file test/sub_dli_task.py \
                        --obs-bucket ${your_bucket} \
                        --queue dli_test \
                        --spark-version 2.4.5 \
                        --driver-cores 1 \
                        --driver-memory 1G \
                        --executor-cores 1 \
                        --executor-memory 1G \
                        --num-executors 1 
Table 1 Description

Parameter

Type

Mandatory

Description

YAML_FILE

String, a local file path

No

Configuration file of a DLI Spark job. If this parameter is not specified, the configuration file is empty.

--file

String

Yes

Entry file for program running. The value can be a local file path, an OBS path, or the name of a JAR or PyFile package that has been uploaded to the DLI resource management system.

-cn / --class_name

String

Yes

Java/Spark main class of the batch processing job.

--name

String

No

Specified job name. The value consists of a maximum of 128 characters.

--image

String

No

Path to a custom image in the format of "Organization name/Image name:Image version". This parameter is valid only when feature is set to custom. You can use this parameter with the feature parameter to specify a custom Spark image for job running.

-obs / --obs-bucket

String

No

OBS bucket for storing a Spark job. Configure this parameter when you need to save jobs. It can also be used as a transit station for submitting local files to resources.

-sv/ --spark-version

String

No

Spark component version used by a job.

-st / `--sc-type

String

No

If the current Spark component version is 2.3.2, leave this parameter blank. If the current Spark component version is 2.3.3, configure this parameter when feature is set to basic or ai. If this parameter is not specified, the default Spark component version 2.3.2 will be used.

--feature

String

No

Job feature, indicating the type of the Spark image used by a job. The default value is basic.

  • basic: A base Spark image provided by DLI is used.
  • custom: A custom Spark image is used.
  • ai: An AI image provided by DLI is used.

--queue

String

No

Queue name. Set this parameter to the name of a created DLI queue. The queue must be of the common type. For details about how to obtain a queue name, see Table 1.

-ec / --executor-cores

String

No

Number of CPU cores of each Executor in the Spark application. This configuration will replace the default setting in sc_type.

-em / --executor-memory

String

No

Executor memory of the Spark application, for example, 2 GB or 2048 MB. This configuration will replace the default setting in sc_type. The unit must be provided. Otherwise, the startup fails.

-ne / --num-executors

String

No

Number of Executors in a Spark application. This configuration will replace the default setting in sc_type.

-dc / --driver-cores

String

No

Number of CPU cores of the Spark application driver. This configuration will replace the default setting in sc_type.

-dm / --driver-memory

String

No

Driver memory of the Spark application, for example, 2 GB or 2048 MB. This configuration will replace the default setting in sc_type. The unit must be provided. Otherwise, the startup fails.

--conf

Array of string

No

Batch configuration. For details, see Spark Configuration. To specify multiple parameters, use --conf conf1 --conf conf2.

--resources

Array of string

No

Name of a resource package, which can be a local file, OBS path, or a file that has been uploaded to the DLI resource management system. To specify multiple parameters, use --resources resource1 --resources resource2.

--files

Array of string

No

Name of the file package that has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name. Local files are also supported. To specify multiple parameters, use --files file1 --files file2.

--jars

Array of string

No

Name of the JAR package that has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name. Local files are also supported. To specify multiple parameters, use --jars jar1 --jars jar2.

-pf /--python-files

Array of string

No

Name of the PyFile package that has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name. Local files are also supported. To specify multiple parameters, use --python-files py1 --python-files py2.

--groups

Array of string

No

Resource group name. To specify multiple parameters, use --groups group1 --groups group2.

--args

Array of string

No

Input parameters of the main class, which are application parameters. To specify multiple parameters, use --args arg1 --args arg2.

-q / --quiet

Bool

No

After a DLI Spark job is submitted, the system exits directly and does not print the job status synchronously.

Examples

  • Submit a DLI Spark job using the YAML_FILE file.
    $ma-cli dli-job submit dli_job.yaml

  • Submit a DLI Spark job by specifying the options parameter in the CLI.
    $ma-cli dli-job submit --name test-spark-from-sdk \
    >                         --file test/jumpstart-trainingjob-gallery-pytorch-sample.ipynb \
    >                         --queue dli_ma_notebook \
    >                         --spark-version 2.4.5 \
    >                         --driver-cores 1 \
    >                         --driver-memory 1G \
    >                         --executor-cores 1 \
    >                         --executor-memory 1G \
    >                         --num-executors 1 

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback