Submitting a DLI Spark Job
Run the ma-cli dli-job submit command to submit a DLI Spark job.
Before running this command, configure YAML_FILE to specify the path to the configuration file of the target job. If this parameter is not specified, the configuration file is empty. The configuration file is in YAML format, and its parameters are the option parameter of the command. If you specify both the YAML_FILE configuration file and the option parameter in the CLI, the value of the option parameter will overwrite that in the configuration file.
CLI Parameters
ma-cli dli-job submit -h Usage: ma-cli dli-job submit [OPTIONS] [YAML_FILE]... Submit DLI Spark job. Example: ma-cli dli-job submit --name test-spark-from-sdk --file test/sub_dli_task.py --obs-bucket dli-bucket --queue dli_test --spark-version 2.4.5 --driver-cores 1 --driver-memory 1G --executor-cores 1 --executor-memory 1G --num-executors 1 Options: --file TEXT Python file or app jar. -cn, --class-name TEXT Your application's main class (for Java / Scala apps). --name TEXT Job name. --image TEXT Full swr custom image path. --queue TEXT Execute queue name. -obs, --obs-bucket TEXT DLI obs bucket to save logs. -sv, --spark-version TEXT Spark version. -st, --sc-type [A|B|C] Compute resource type. --feature [basic|custom|ai] Type of the Spark image used by a job (default: basic). -ec, --executor-cores INTEGER Executor cores. -em, --executor-memory TEXT Executor memory (eg. 2G/2048MB). -ne, --num-executors INTEGER Executor number. -dc, --driver-cores INTEGER Driver cores. -dm, --driver-memory TEXT Driver memory (eg. 2G/2048MB). --conf TEXT Arbitrary Spark configuration property (eg. <PROP=VALUE>). --resources TEXT Resources package path. --files TEXT Files to be placed in the working directory of each executor. --jars TEXT Jars to include on the driver and executor class paths. -pf, --py-files TEXT Python files to place on the PYTHONPATH for Python apps. --groups TEXT User group resources. --args TEXT Spark batch job parameter args. -q, --quiet Exit without waiting after submit successfully. -C, --config-file PATH Configure file path for authorization. -D, --debug Debug Mode. Shows full stack trace when error occurs. -P, --profile TEXT CLI connection profile to use. The default profile is "DEFAULT". -H, -h, --help Show this message and exit.
YAML File Preview
# dli-demo.yaml name: test-spark-from-sdk file: test/sub_dli_task.py obs-bucket: ${your_bucket} queue: dli_notebook spark-version: 2.4.5 driver-cores: 1 driver-memory: 1G executor-cores: 1 executor-memory: 1G num-executors: 1 ## [Optional] jars: - ./test.jar - obs://your-bucket/jars/test.jar - your_group/test.jar ## [Optional] files: - ./test.csv - obs://your-bucket/files/test.csv - your_group/test.csv ## [Optional] python-files: - ./test.py - obs://your-bucket/files/test.py - your_group/test.py ## [Optional] resources: - name: your_group/test.py type: pyFile - name: your_group/test.csv type: file - name: your_group/test.jar type: jar - name: ./test.py type: pyFile - name: obs://your-bucket/files/test.py type: pyFile ## [Optional] groups: - group1 - group2
Example of submitting a DLI Spark job with options specified:
$ ma-cli dli-job submit --name test-spark-from-sdk \ --file test/sub_dli_task.py \ --obs-bucket ${your_bucket} \ --queue dli_test \ --spark-version 2.4.5 \ --driver-cores 1 \ --driver-memory 1G \ --executor-cores 1 \ --executor-memory 1G \ --num-executors 1
Parameter |
Type |
Mandatory |
Description |
---|---|---|---|
YAML_FILE |
String, a local file path |
No |
Configuration file of a DLI Spark job. If this parameter is not specified, the configuration file is empty. |
--file |
String |
Yes |
Entry file for program running. The value can be a local file path, an OBS path, or the name of a JAR or PyFile package that has been uploaded to the DLI resource management system. |
-cn / --class_name |
String |
Yes |
Java/Spark main class of the batch processing job. |
--name |
String |
No |
Specified job name. The value consists of a maximum of 128 characters. |
--image |
String |
No |
Path to a custom image in the format of "Organization name/Image name:Image version". This parameter is valid only when feature is set to custom. You can use this parameter with the feature parameter to specify a custom Spark image for job running. |
-obs / --obs-bucket |
String |
No |
OBS bucket for storing a Spark job. Configure this parameter when you need to save jobs. It can also be used as a transit station for submitting local files to resources. |
-sv/ --spark-version |
String |
No |
Spark component version used by a job. |
-st / `--sc-type |
String |
No |
If the current Spark component version is 2.3.2, leave this parameter blank. If the current Spark component version is 2.3.3, configure this parameter when feature is set to basic or ai. If this parameter is not specified, the default Spark component version 2.3.2 will be used. |
--feature |
String |
No |
Job feature, indicating the type of the Spark image used by a job. The default value is basic.
|
--queue |
String |
No |
Queue name. Set this parameter to the name of a created DLI queue. The queue must be of the common type. For details about how to obtain a queue name, see Table 1. |
-ec / --executor-cores |
String |
No |
Number of CPU cores of each Executor in the Spark application. This configuration will replace the default setting in sc_type. |
-em / --executor-memory |
String |
No |
Executor memory of the Spark application, for example, 2 GB or 2048 MB. This configuration will replace the default setting in sc_type. The unit must be provided. Otherwise, the startup fails. |
-ne / --num-executors |
String |
No |
Number of Executors in a Spark application. This configuration will replace the default setting in sc_type. |
-dc / --driver-cores |
String |
No |
Number of CPU cores of the Spark application driver. This configuration will replace the default setting in sc_type. |
-dm / --driver-memory |
String |
No |
Driver memory of the Spark application, for example, 2 GB or 2048 MB. This configuration will replace the default setting in sc_type. The unit must be provided. Otherwise, the startup fails. |
--conf |
Array of string |
No |
Batch configuration. For details, see Spark Configuration. To specify multiple parameters, use --conf conf1 --conf conf2. |
--resources |
Array of string |
No |
Name of a resource package, which can be a local file, OBS path, or a file that has been uploaded to the DLI resource management system. To specify multiple parameters, use --resources resource1 --resources resource2. |
--files |
Array of string |
No |
Name of the file package that has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name. Local files are also supported. To specify multiple parameters, use --files file1 --files file2. |
--jars |
Array of string |
No |
Name of the JAR package that has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name. Local files are also supported. To specify multiple parameters, use --jars jar1 --jars jar2. |
-pf /--python-files |
Array of string |
No |
Name of the PyFile package that has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name. Local files are also supported. To specify multiple parameters, use --python-files py1 --python-files py2. |
--groups |
Array of string |
No |
Resource group name. To specify multiple parameters, use --groups group1 --groups group2. |
--args |
Array of string |
No |
Input parameters of the main class, which are application parameters. To specify multiple parameters, use --args arg1 --args arg2. |
-q / --quiet |
Bool |
No |
After a DLI Spark job is submitted, the system exits directly and does not print the job status synchronously. |
Examples
- Submit a DLI Spark job using the YAML_FILE file.
$ma-cli dli-job submit dli_job.yaml
- Submit a DLI Spark job by specifying the options parameter in the CLI.
$ma-cli dli-job submit --name test-spark-from-sdk \ > --file test/jumpstart-trainingjob-gallery-pytorch-sample.ipynb \ > --queue dli_ma_notebook \ > --spark-version 2.4.5 \ > --driver-cores 1 \ > --driver-memory 1G \ > --executor-cores 1 \ > --executor-memory 1G \ > --num-executors 1
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot