Updated on 2025-08-06 GMT+08:00

Creating a Batch Processing Job

Function

This API is used to create a batch processing job in a queue.

During the Spark job submission process, if the job fails to acquire resources successfully for an extended period, the job status will change to dead after waiting for approximately 3 hours. For details about Spark job statuses, see Table 7.

URI

  • URI format

    POST /v2.0/{project_id}/batches

  • Parameter description
    Table 1 URI parameter

    Parameter

    Mandatory

    Type

    Description

    project_id

    Yes

    String

    Definition

    Project ID, which is used for resource isolation. For how to obtain a project ID, see Obtaining a Project ID.

    Example: 48cc2c48765f481480c7db940d6409d1

    Constraints

    None

    Range

    The value can contain up to 64 characters. Only letters and digits are allowed.

    Default Value

    None

Request Parameters

Table 2 Request parameters

Parameter

Mandatory

Type

Description

file

Yes

String

Definition

Name of the package that is of the JAR or pyFile type and has been uploaded to the DLI resource management system.

You can also specify an OBS path, for example, obs://Bucket name/Package name.

Constraints

Spark 3.3.x or later supports only packages in OBS paths.

Range

None

Default Value

None

className

Yes

String

Definition

Java/Spark main class of the batch processing job

Constraints

None

Range

None

Default Value

None

queue

No

String

Definition

Queue name. Set this parameter to the name of the created DLI queue. The queue must be of the general-purpose type.

Constraints

  • This parameter is compatible with the cluster_name parameter. That is, if cluster_name is used to specify a queue, the queue is still valid.
  • The queue parameter is recommended. If queue and cluster_name are both set, the value of queue is used.

Range

None

Default Value

None

cluster_name

No

String

Definition

Queue name. Set this parameter to the created DLI queue name.

Constraints

You are advised to use the queue parameter. The queue and cluster_name parameters cannot coexist.

Range

None

Default Value

None

args

No

Array of strings

Definition

Input parameters of the main class, that is, application parameters.

Constraints

None

Range

None

Default Value

None

sc_type

No

String

Definition

Compute resource type. Currently, resource types A, B, and C are available. If this parameter is not specified, the minimum configuration (type A) is used. For details about resource types, see Table 3.

Constraints

None

Range

None

Default Value

None

jars

No

Array of strings

Definition

Name of the package that is of the JAR type and has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name.

Constraints

None

Range

None

Default Value

None

pyFiles

No

Array of strings

Definition

Name of the package that is of the PyFile type and has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name.

Constraints

None

Range

None

Default Value

None

files

No

Array of strings

Definition

Name of the package that is of the file type and has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name.

Constraints

None

Range

None

Default Value

None

modules

No

Array of strings

Definition

Name of the dependency system resource module. You can check the module name using the Querying Resource Packages in a Group (Deprecated) API.

Constraints

None

Range

DLI provides dependencies for executing datasource jobs. The following table lists the dependency modules corresponding to different services.
  • CloudTable/MRS HBase: sys.datasource.hbase
  • CloudTable/MRS OpenTSDB: sys.datasource.opentsdb
  • RDS MySQL: sys.datasource.rds
  • RDS Postgre: preset
  • DWS: preset
  • CSS: sys.datasource.css

Default Value

None

resources

No

Array of objects

Definition

JSON object list, including the name and type of the JSON package that has been uploaded to the queue. For details, see Table 4.

Constraints

Spark 3.3.x or later does not support this parameter. Configure resource package information in jars, pyFiles, and files.

Range

None

Default Value

None

groups

No

Array of objects

Definition

JSON object list, including the package group resource. For details about the format, see the request example. If the type of the name in resources is not verified, the package with the name exists in the group. For details, see Table 5.

Constraints

Spark 3.3.x or later does not support group information configuration.

Range

None

Default Value

None

conf

No

Object

Definition

Batch configuration item. For details, see Spark Configuration.

Constraints

None

Range

None

Default Value

None

name

No

String

Definition

Batch processing task name. The value contains a maximum of 128 characters.

Constraints

None

Range

None

Default Value

None

driverMemory

No

String

Definition

Driver memory of the Spark application, for example, 2 GB and 2048 MB. This configuration will replace the default settings in sc_type. When using it, you must include the unit, otherwise it will fail to start.

Constraints

None

Range

None

Default Value

None

driverCores

No

Integer

Definition

Number of CPU cores of the Spark application driver. This configuration item replaces the default parameter in sc_type.

Constraints

None

Range

None

Default Value

None

executorMemory

No

String

Definition

Executor memory of the Spark application, for example, 2 GB and 2048 MB. This configuration will replace the default settings in sc_type. When using it, you must include the unit, otherwise it will fail to start.

Constraints

None

Range

None

Default Value

None

executorCores

No

Integer

Definition

Number of CPU cores of each Executor in the Spark application. This configuration item replaces the default parameter in sc_type.

Constraints

None

Range

None

Default Value

None

numExecutors

No

Integer

Definition

Number of Executors in a Spark application. This configuration item replaces the default parameter in sc_type.

Constraints

None

Range

None

Default Value

None

obs_bucket

No

String

Definition

OBS bucket for storing the Spark jobs. Set this parameter when you need to save jobs.

Constraints

None

Range

None

Default Value

None

auto_recovery

No

Boolean

Definition

Whether to enable the retry function. If enabled, Spark jobs will be automatically retried after an exception occurs. The default value is false.

Constraints

None

Range

None

Default Value

false

max_retry_times

No

Integer

Definition

Maximum retry times. The maximum value is 100, and the default value is 20.

Constraints

None

Range

None

Default Value

20

feature

No

String

Definition

Job feature. Type of the Spark image used by a job.

Constraints

None

Range

  • custom: indicates that the user-defined Spark image is used.

Default Value

None

spark_version

No

String

Definition

Version of the Spark component

Constraints

None

Range

  • If the in-use Spark version is 2.3.2, this parameter is not required.

Default Value

None

execution_agency_urn

No

String

Definition

Name of the agency authorized to DLI. This parameter is configurable in Spark 3.3.1.

Constraints

None

Range

None

Default Value

None

image

No

String

Definition

Custom image. The format is Organization name/Image name:Image version.

Constraints

This parameter is valid only when feature is set to custom. You can use this parameter with the feature parameter to specify a user-defined Spark image for job running. For details about how to use custom images, see Data Lake Insight User Guide.

Range

None

Default Value

None

catalog_name

No

String

Definition

To access metadata, set this parameter to dli.

Constraints

None

Range

None

Default Value

None

Table 3 Resource types

Resource Type

Physical Resource

driverCores

executorCores

driverMemory

executorMemory

numExecutor

A

8 vCPUs, 32 GB memory

2

1

7 GB

4 GB

6

B

16 vCPUs, 64 GB memory

2

2

7 GB

8 GB

7

C

32 vCPUs, 128 GB memory

4

2

15 GB

8 GB

14

Table 4 resources parameters

Parameter

Mandatory

Type

Description

name

No

String

Definition

Resource name You can also specify an OBS path, for example, obs://Bucket name/Package name.

Constraints

None

Range

None

Default Value

None

type

No

String

Definition

Resource type.

Constraints

None

Range

None

Default Value

None

Table 5 groups parameters

Parameter

Mandatory

Type

Description

name

No

String

Definition

User group name

Constraints

None

Range

None

Default Value

None

resources

No

Array of objects

Definition

User group resource For details, see Table 4.

Constraints

None

Range

None

Default Value

None

Response Parameters

Table 6 Response parameters

Parameter

Mandatory

Type

Description

id

No

String

Definition

ID of a batch processing job.

Range

None

appId

No

String

Definition

Back-end application ID of a batch processing job.

Range

None

name

No

String

Definition

Batch processing task name. The value contains a maximum of 128 characters.

Range

None

owner

No

String

Definition

Owner of a batch processing job.

Range

None

proxyUser

No

String

Definition

Proxy user (resource tenant) to which a batch processing job belongs.

Range

None

state

No

String

Definition

Status of a batch processing job. For details, see Table 7.

Range

None

kind

No

String

Definition

Type of a batch processing job. Only Spark parameters are supported.

Range

None

log

No

Array of strings

Definition

Last 10 records of the current batch processing job.

Range

None

sc_type

No

String

Definition

Type of a computing resource. If the computing resource type is customized, value CUSTOMIZED is returned.

Range

None

cluster_name

No

String

Definition

Queue where a batch processing job is located.

Range

None

queue

Yes

String

Definition

Queue name. Set this parameter to the name of the created DLI queue.

  • This parameter is compatible with the cluster_name parameter. That is, if cluster_name is used to specify a queue, the queue is still valid.
  • You are advised to use the queue parameter. The queue and cluster_name parameters cannot coexist.

Range

None

image

No

String

Definition

Custom image. The format is Organization name/Image name:Image version.

Range

This parameter is valid only when feature is set to custom. You can use this parameter with the feature parameter to specify a user-defined Spark image for job running. For details about how to use custom images, see Data Lake Insight User Guide.

create_time

No

Long

Definition

Time when a batch processing job is created. The timestamp is expressed in milliseconds.

Range

None

update_time

No

Long

Definition

Time when a batch processing job is updated. The timestamp is expressed in milliseconds.

Range

None

duration

No

Long

Definition

Job running duration (unit: millisecond)

Range

None

Table 7 Batch processing job statuses

Parameter

Type

Description

starting

String

The batch processing job is being started.

running

String

The batch processing job is executing a task.

dead

String

The batch processing job has exited.

success

String

The batch processing job is successfully executed.

recovering

String

The batch processing job is being restored.

Example Request

Create a Spark job. Set the Spark main class of the job to org.apache.spark.examples.SparkPi, specify the program package to batchTest/spark-examples_2.11-2.1.0.luxor.jar, and load the program package whose type is jar and the resource package whose type is files.

{
    "file": "batchTest/spark-examples_2.11-2.1.0.luxor.jar",
    "className": "org.apache.spark.examples.SparkPi",
    "sc_type": "A",
    "jars": ["demo-1.0.0.jar"],
    "files": ["count.txt"],
    "resources":[
                   {"name": "groupTest/testJar.jar", "type": "jar"},
                   {"name": "kafka-clients-0.10.0.0.jar", "type": "jar"}],
    "groups": [
                   {"name": "groupTestJar", "resources": [{"name": "testJar.jar", "type": "jar"}, {"name": "testJar1.jar", "type": "jar"}]}, 
                   {"name": "batchTest", "resources":  [{"name": "luxor.jar", "type": "jar"}]}],
    "queue": " test",
    "name": "TestDemo4",
    "feature": "basic",
    "execution_agency_urn": "myAgencyName",
    "spark_version": "2.3.2"
}

The batchTest/spark-examples_2.11-2.1.0.luxor.jar file has been uploaded through API involved in Uploading a Package Group (Deprecated).

Example Response

{
  "id": "07a3e4e6-9a28-4e92-8d3f-9c538621a166",
  "appId": "",
  "name": "",
  "owner": "test1",
  "proxyUser": "",
  "state": "starting",
  "kind": "",
  "log": [],
  "sc_type": "CUSTOMIZED",
  "cluster_name": "aaa",
  "queue": "aaa",
  "create_time": 1607589874156,
  "update_time": 1607589874156
}

Status Codes

Table 8 describes the status code.

Table 8 Status codes

Status Code

Description

200

The job is created successfully.

400

Request error.

500

Internal service error.

Error Codes

If an error occurs when this API is invoked, the system does not return the result similar to the preceding example, but returns the error code and error information. For details, see Error Codes.