Submitting a DLI Spark Job

Run the ma-cli dli-job submit command to submit a DLI Spark job.

Before running this command, configure YAML_FILE to specify the path to the configuration file of the target job. If this parameter is not specified, the configuration file is empty. The configuration file is in YAML format, and its parameters are the option parameter of the command. If you specify both the YAML_FILE configuration file and the option parameter in the CLI, the value of the option parameter will overwrite that in the configuration file.

CLI Parameters

ma-cli dli-job submit -h
Usage: ma-cli dli-job submit [OPTIONS] [YAML_FILE]...

  Submit DLI Spark job.

  Example:

  ma-cli dli-job submit  --name test-spark-from-sdk
                          --file test/sub_dli_task.py
                          --obs-bucket dli-bucket
                          --queue dli_test
                          --spark-version 2.4.5
                          --driver-cores 1
                          --driver-memory 1G
                          --executor-cores 1
                          --executor-memory 1G
                          --num-executors 1

Options:
  --file TEXT                    Python file or app jar.
  -cn, --class-name TEXT         Your application's main class (for Java / Scala apps).
  --name TEXT                    Job name.
  --image TEXT                   Full swr custom image path.
  --queue TEXT                   Execute queue name.
  -obs, --obs-bucket TEXT        DLI obs bucket to save logs.
  -sv, --spark-version TEXT      Spark version.
  -st, --sc-type [A|B|C]         Compute resource type.
  --feature [basic|custom|ai]    Type of the Spark image used by a job (default: basic).
  -ec, --executor-cores INTEGER  Executor cores.
  -em, --executor-memory TEXT    Executor memory (eg. 2G/2048MB).
  -ne, --num-executors INTEGER   Executor number.
  -dc, --driver-cores INTEGER    Driver cores.
  -dm, --driver-memory TEXT      Driver memory (eg. 2G/2048MB).
  --conf TEXT                    Arbitrary Spark configuration property (eg. <PROP=VALUE>).
  --resources TEXT               Resources package path.
  --files TEXT                   Files to be placed in the working directory of each executor.
  --jars TEXT                    Jars to include on the driver and executor class paths.
  -pf, --py-files TEXT           Python files to place on the PYTHONPATH for Python apps.
  --groups TEXT                  User group resources.
  --args TEXT                    Spark batch job parameter args.
  -q, --quiet                    Exit without waiting after submit successfully.
  -C, --config-file PATH         Configure file path for authorization.
  -D, --debug                    Debug Mode. Shows full stack trace when error occurs.
  -P, --profile TEXT             CLI connection profile to use. The default profile is "DEFAULT".
  -H, -h, --help                 Show this message and exit.

YAML File Preview

# dli-demo.yaml
name: test-spark-from-sdk
file: test/sub_dli_task.py
obs-bucket: ${your_bucket}
queue: dli_notebook 
spark-version: 2.4.5
driver-cores: 1
driver-memory: 1G
executor-cores: 1
executor-memory: 1G
num-executors: 1

## [Optional] 
jars:
  - ./test.jar
  - obs://your-bucket/jars/test.jar
  - your_group/test.jar

## [Optional] 
files:
  - ./test.csv
  - obs://your-bucket/files/test.csv
  - your_group/test.csv

## [Optional] 
python-files:
  - ./test.py
  - obs://your-bucket/files/test.py
  - your_group/test.py

## [Optional] 
resources:
  - name: your_group/test.py
    type: pyFile
  - name: your_group/test.csv
    type: file
  - name: your_group/test.jar
    type: jar
  - name: ./test.py
    type: pyFile
  - name: obs://your-bucket/files/test.py
    type: pyFile

## [Optional]
groups:
  - group1
  - group2

Example of submitting a DLI Spark job with options specified:

$ ma-cli dli-job submit --name test-spark-from-sdk \
                        --file test/sub_dli_task.py \
                        --obs-bucket ${your_bucket} \
                        --queue dli_test \
                        --spark-version 2.4.5 \
                        --driver-cores 1 \
                        --driver-memory 1G \
                        --executor-cores 1 \
                        --executor-memory 1G \
                        --num-executors 1

**Table 1** Description
Parameter	Type	Mandatory	Description
YAML_FILE	String, a local file path	No	Configuration file of a DLI Spark job. If this parameter is not specified, the configuration file is empty.
--file	String	Yes	Entry file for program running. The value can be a local file path, an OBS path, or the name of a JAR or PyFile package that has been uploaded to the DLI resource management system.
-cn / --class_name	String	Yes	Java/Spark main class of the batch processing job.
--name	String	No	Specified job name. The value consists of a maximum of 128 characters.
--image	String	No	Path to a custom image in the format of "Organization name/Image name:Image version". This parameter is valid only when feature is set to custom. You can use this parameter with the feature parameter to specify a custom Spark image for job running.
-obs / --obs-bucket	String	No	OBS bucket for storing a Spark job. Configure this parameter when you need to save jobs. It can also be used as a transit station for submitting local files to resources.
-sv/ --spark-version	String	No	Spark component version used by a job.
-st / `--sc-type	String	No	If the current Spark component version is 2.3.2, leave this parameter blank. If the current Spark component version is 2.3.3, configure this parameter when feature is set to basic or ai. If this parameter is not specified, the default Spark component version 2.3.2 will be used.
--feature	String	No	Job feature, indicating the type of the Spark image used by a job. The default value is basic. basic: A base Spark image provided by DLI is used. custom: A custom Spark image is used. ai: An AI image provided by DLI is used.
--queue	String	No	Queue name. Set this parameter to the name of a created DLI queue. The queue must be of the common type. For details about how to obtain a queue name, see Table 1.
-ec / --executor-cores	String	No	Number of CPU cores of each Executor in the Spark application. This configuration will replace the default setting in sc_type.
-em / --executor-memory	String	No	Executor memory of the Spark application, for example, 2 GB or 2048 MB. This configuration will replace the default setting in sc_type. The unit must be provided. Otherwise, the startup fails.
-ne / --num-executors	String	No	Number of Executors in a Spark application. This configuration will replace the default setting in sc_type.
-dc / --driver-cores	String	No	Number of CPU cores of the Spark application driver. This configuration will replace the default setting in sc_type.
-dm / --driver-memory	String	No	Driver memory of the Spark application, for example, 2 GB or 2048 MB. This configuration will replace the default setting in sc_type. The unit must be provided. Otherwise, the startup fails.
--conf	Array of string	No	Batch configuration. For details, see Spark Configuration. To specify multiple parameters, use --conf conf1 --conf conf2.
--resources	Array of string	No	Name of a resource package, which can be a local file, OBS path, or a file that has been uploaded to the DLI resource management system. To specify multiple parameters, use --resources resource1 --resources resource2.
--files	Array of string	No	Name of the file package that has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name. Local files are also supported. To specify multiple parameters, use --files file1 --files file2.
--jars	Array of string	No	Name of the JAR package that has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name. Local files are also supported. To specify multiple parameters, use --jars jar1 --jars jar2.
-pf /--python-files	Array of string	No	Name of the PyFile package that has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name. Local files are also supported. To specify multiple parameters, use --python-files py1 --python-files py2.
--groups	Array of string	No	Resource group name. To specify multiple parameters, use --groups group1 --groups group2.
--args	Array of string	No	Input parameters of the main class, which are application parameters. To specify multiple parameters, use --args arg1 --args arg2.
-q / --quiet	Bool	No	After a DLI Spark job is submitted, the system exits directly and does not print the job status synchronously.

Examples

Submit a DLI Spark job using the YAML_FILE file.
```
$ma-cli dli-job submit dli_job.yaml
```

Submit a DLI Spark job by specifying the options parameter in the CLI.

$ma-cli dli-job submit --name test-spark-from-sdk \
>                         --file test/jumpstart-trainingjob-gallery-pytorch-sample.ipynb \
>                         --queue dli_ma_notebook \
>                         --spark-version 2.4.5 \
>                         --driver-cores 1 \
>                         --driver-memory 1G \
>                         --executor-cores 1 \
>                         --executor-memory 1G \
>                         --num-executors 1

Click to enlarge