Creating a Batch Processing Job
Function
This API is used to create a batch processing job in a queue.

During the Spark job submission process, if the job fails to acquire resources successfully for an extended period, the job status will change to dead after waiting for approximately 3 hours. For details about Spark job statuses, see Table 7.
URI
- URI format
- Parameter description
Table 1 URI parameter Parameter
Mandatory
Type
Description
project_id
Yes
String
Definition
Project ID, which is used for resource isolation. For how to obtain a project ID, see Obtaining a Project ID.
Example: 48cc2c48765f481480c7db940d6409d1
Constraints
None
Range
The value can contain up to 64 characters. Only letters and digits are allowed.
Default Value
None
Request Parameters
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
file |
Yes |
String |
Definition Name of the package that is of the JAR or pyFile type and has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name. Constraints Spark 3.3.x or later supports only packages in OBS paths. Range None Default Value None |
className |
Yes |
String |
Definition Java/Spark main class of the batch processing job Constraints None Range None Default Value None |
queue |
No |
String |
Definition Queue name. Set this parameter to the name of the created DLI queue. The queue must be of the general-purpose type. Constraints
Range None Default Value None |
cluster_name |
No |
String |
Definition Queue name. Set this parameter to the created DLI queue name. Constraints You are advised to use the queue parameter. The queue and cluster_name parameters cannot coexist. Range None Default Value None |
args |
No |
Array of strings |
Definition Input parameters of the main class, that is, application parameters. Constraints None Range None Default Value None |
sc_type |
No |
String |
Definition Compute resource type. Currently, resource types A, B, and C are available. If this parameter is not specified, the minimum configuration (type A) is used. For details about resource types, see Table 3. Constraints None Range None Default Value None |
jars |
No |
Array of strings |
Definition Name of the package that is of the JAR type and has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name. Constraints None Range None Default Value None |
pyFiles |
No |
Array of strings |
Definition Name of the package that is of the PyFile type and has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name. Constraints None Range None Default Value None |
files |
No |
Array of strings |
Definition Name of the package that is of the file type and has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name. Constraints None Range None Default Value None |
modules |
No |
Array of strings |
Definition Name of the dependency system resource module. You can check the module name using the Querying Resource Packages in a Group (Deprecated) API. Constraints None Range
DLI provides dependencies for executing datasource jobs. The following table lists the dependency modules corresponding to different services.
Default Value None |
resources |
No |
Array of objects |
Definition JSON object list, including the name and type of the JSON package that has been uploaded to the queue. For details, see Table 4. Constraints Spark 3.3.x or later does not support this parameter. Configure resource package information in jars, pyFiles, and files. Range None Default Value None |
groups |
No |
Array of objects |
Definition JSON object list, including the package group resource. For details about the format, see the request example. If the type of the name in resources is not verified, the package with the name exists in the group. For details, see Table 5. Constraints Spark 3.3.x or later does not support group information configuration. Range None Default Value None |
conf |
No |
Object |
Definition Batch configuration item. For details, see Spark Configuration. Constraints None Range None Default Value None |
name |
No |
String |
Definition Batch processing task name. The value contains a maximum of 128 characters. Constraints None Range None Default Value None |
driverMemory |
No |
String |
Definition Driver memory of the Spark application, for example, 2 GB and 2048 MB. This configuration will replace the default settings in sc_type. When using it, you must include the unit, otherwise it will fail to start. Constraints None Range None Default Value None |
driverCores |
No |
Integer |
Definition Number of CPU cores of the Spark application driver. This configuration item replaces the default parameter in sc_type. Constraints None Range None Default Value None |
executorMemory |
No |
String |
Definition Executor memory of the Spark application, for example, 2 GB and 2048 MB. This configuration will replace the default settings in sc_type. When using it, you must include the unit, otherwise it will fail to start. Constraints None Range None Default Value None |
executorCores |
No |
Integer |
Definition Number of CPU cores of each Executor in the Spark application. This configuration item replaces the default parameter in sc_type. Constraints None Range None Default Value None |
numExecutors |
No |
Integer |
Definition Number of Executors in a Spark application. This configuration item replaces the default parameter in sc_type. Constraints None Range None Default Value None |
obs_bucket |
No |
String |
Definition OBS bucket for storing the Spark jobs. Set this parameter when you need to save jobs. Constraints None Range None Default Value None |
auto_recovery |
No |
Boolean |
Definition Whether to enable the retry function. If enabled, Spark jobs will be automatically retried after an exception occurs. The default value is false. Constraints None Range None Default Value false |
max_retry_times |
No |
Integer |
Definition Maximum retry times. The maximum value is 100, and the default value is 20. Constraints None Range None Default Value 20 |
feature |
No |
String |
Definition Job feature. Type of the Spark image used by a job. Constraints None Range
Default Value None |
spark_version |
No |
String |
Definition Version of the Spark component Constraints None Range
Default Value None |
execution_agency_urn |
No |
String |
Definition Name of the agency authorized to DLI. This parameter is configurable in Spark 3.3.1. Constraints None Range None Default Value None |
image |
No |
String |
Definition Custom image. The format is Organization name/Image name:Image version. Constraints This parameter is valid only when feature is set to custom. You can use this parameter with the feature parameter to specify a user-defined Spark image for job running. For details about how to use custom images, see . Range None Default Value None |
catalog_name |
No |
String |
Definition To access metadata, set this parameter to dli. Constraints None Range None Default Value None |
Resource Type |
Physical Resource |
driverCores |
executorCores |
driverMemory |
executorMemory |
numExecutor |
---|---|---|---|---|---|---|
A |
8 vCPUs, 32 GB memory |
2 |
1 |
7 GB |
4 GB |
6 |
B |
16 vCPUs, 64 GB memory |
2 |
2 |
7 GB |
8 GB |
7 |
C |
32 vCPUs, 128 GB memory |
4 |
2 |
15 GB |
8 GB |
14 |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
name |
No |
String |
Definition Resource name You can also specify an OBS path, for example, obs://Bucket name/Package name. Constraints None Range None Default Value None |
type |
No |
String |
Definition Resource type. Constraints None Range None Default Value None |
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
name |
No |
String |
Definition User group name Constraints None Range None Default Value None |
resources |
No |
Array of objects |
Definition User group resource For details, see Table 4. Constraints None Range None Default Value None |
Response Parameters
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
id |
No |
String |
Definition ID of a batch processing job. Range None |
appId |
No |
String |
Definition Back-end application ID of a batch processing job. Range None |
name |
No |
String |
Definition Batch processing task name. The value contains a maximum of 128 characters. Range None |
owner |
No |
String |
Definition Owner of a batch processing job. Range None |
proxyUser |
No |
String |
Definition Proxy user (resource tenant) to which a batch processing job belongs. Range None |
state |
No |
String |
Definition Status of a batch processing job. For details, see Table 7. Range None |
kind |
No |
String |
Definition Type of a batch processing job. Only Spark parameters are supported. Range None |
log |
No |
Array of strings |
Definition Last 10 records of the current batch processing job. Range None |
sc_type |
No |
String |
Definition Type of a computing resource. If the computing resource type is customized, value CUSTOMIZED is returned. Range None |
cluster_name |
No |
String |
Definition Queue where a batch processing job is located. Range None |
queue |
Yes |
String |
Definition Queue name. Set this parameter to the name of the created DLI queue.
Range None |
image |
No |
String |
Definition Custom image. The format is Organization name/Image name:Image version. Range This parameter is valid only when feature is set to custom. You can use this parameter with the feature parameter to specify a user-defined Spark image for job running. For details about how to use custom images, see . |
create_time |
No |
Long |
Definition Time when a batch processing job is created. The timestamp is expressed in milliseconds. Range None |
update_time |
No |
Long |
Definition Time when a batch processing job is updated. The timestamp is expressed in milliseconds. Range None |
duration |
No |
Long |
Definition Job running duration (unit: millisecond) Range None |
Parameter |
Type |
Description |
---|---|---|
starting |
String |
The batch processing job is being started. |
running |
String |
The batch processing job is executing a task. |
dead |
String |
The batch processing job has exited. |
success |
String |
The batch processing job is successfully executed. |
recovering |
String |
The batch processing job is being restored. |
Example Request
Create a Spark job. Set the Spark main class of the job to org.apache.spark.examples.SparkPi, specify the program package to batchTest/spark-examples_2.11-2.1.0.luxor.jar, and load the program package whose type is jar and the resource package whose type is files.
{ "file": "batchTest/spark-examples_2.11-2.1.0.luxor.jar", "className": "org.apache.spark.examples.SparkPi", "sc_type": "A", "jars": ["demo-1.0.0.jar"], "files": ["count.txt"], "resources":[ {"name": "groupTest/testJar.jar", "type": "jar"}, {"name": "kafka-clients-0.10.0.0.jar", "type": "jar"}], "groups": [ {"name": "groupTestJar", "resources": [{"name": "testJar.jar", "type": "jar"}, {"name": "testJar1.jar", "type": "jar"}]}, {"name": "batchTest", "resources": [{"name": "luxor.jar", "type": "jar"}]}], "queue": " test", "name": "TestDemo4", "feature": "basic", "execution_agency_urn": "myAgencyName", "spark_version": "2.3.2" }

The batchTest/spark-examples_2.11-2.1.0.luxor.jar file has been uploaded through API involved in Uploading a Package Group (Deprecated).
Example Response
{ "id": "07a3e4e6-9a28-4e92-8d3f-9c538621a166", "appId": "", "name": "", "owner": "test1", "proxyUser": "", "state": "starting", "kind": "", "log": [], "sc_type": "CUSTOMIZED", "cluster_name": "aaa", "queue": "aaa", "create_time": 1607589874156, "update_time": 1607589874156 }
Status Codes
Table 8 describes the status code.
Error Codes
If an error occurs when this API is invoked, the system does not return the result similar to the preceding example, but returns the error code and error information. For details, see Error Codes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.