Updated on 2024-07-18 GMT+08:00

Creating a Batch Processing Job

Function

This API is used to create a batch processing job in a queue.

URI

  • URI format

    POST /v2.0/{project_id}/batches

  • Parameter description
    Table 1 URI parameter

    Parameter

    Mandatory

    Type

    Description

    project_id

    Yes

    String

    Project ID, which is used for resource isolation. For details about how to obtain its value, see Obtaining a Project ID.

Request

Table 2 Request parameters

Parameter

Mandatory

Type

Description

file

Yes

String

Name of the package that is of the JAR or pyFile type and has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name.

className

Yes

String

Java/Spark main class of the batch processing job.

queue

No

String

Queue name. Set this parameter to the name of the created DLI queue. The queue must be of the general-purpose type.

NOTE:
  • This parameter is compatible with the cluster_name parameter. That is, if cluster_name is used to specify a queue, the queue is still valid.
  • You are advised to use the queue parameter. The queue and cluster_name parameters cannot coexist.

cluster_name

No

String

Queue name. Set this parameter to the created DLI queue name.

NOTE:

You are advised to use the queue parameter. The queue and cluster_name parameters cannot coexist.

args

No

Array of Strings

Input parameters of the main class, that is, application parameters.

sc_type

No

String

Compute resource type. Currently, resource types A, B, and C are available. If this parameter is not specified, the minimum configuration (type A) is used. For details about resource types, see Table 3.

jars

No

Array of Strings

Name of the package that is of the JAR type and has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name.

pyFiles

No

Array of Strings

Name of the package that is of the PyFile type and has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name.

files

No

Array of Strings

Name of the package that is of the file type and has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name.

modules

No

Array of Strings

Name of the dependent system resource module. You can view the module name using the API related to Querying Resource Packages in a Group (Discarded).

DLI provides dependencies for executing datasource jobs. The following table lists the dependency modules corresponding to different services.
  • CloudTable/MRS HBase: sys.datasource.hbase
  • CloudTable/MRS OpenTSDB: sys.datasource.opentsdb
  • RDS MySQL: sys.datasource.rds
  • RDS Postgre: preset
  • DWS: preset
  • CSS: sys.datasource.css

resources

No

Array of objects

JSON object list, including the name and type of the JSON package that has been uploaded to the queue. For details, see Table 4.

groups

No

Array of objects

JSON object list, including the package group resource. For details about the format, see the request example. If the type of the name in resources is not verified, the package with the name exists in the group. For details, see Table 5.

conf

No

Object

Batch configuration item. For details, see Spark Configuration.

name

No

String

Batch processing task name. The value contains a maximum of 128 characters.

driverMemory

No

String

Driver memory of the Spark application, for example, 2 GB and 2048 MB. This configuration item replaces the default parameter in sc_type. The unit must be provided. Otherwise, the startup fails.

driverCores

No

Integer

Number of CPU cores of the Spark application driver. This configuration item replaces the default parameter in sc_type.

executorMemory

No

String

Executor memory of the Spark application, for example, 2 GB and 2048 MB. This configuration item replaces the default parameter in sc_type. The unit must be provided. Otherwise, the startup fails.

executorCores

No

Integer

Number of CPU cores of each Executor in the Spark application. This configuration item replaces the default parameter in sc_type.

numExecutors

No

Integer

Number of Executors in a Spark application. This configuration item replaces the default parameter in sc_type.

obs_bucket

No

String

OBS bucket for storing the Spark jobs. Set this parameter when you need to save jobs.

auto_recovery

No

Boolean

Whether to enable the retry function. If enabled, Spark jobs will be automatically retried after an exception occurs. The default value is false.

max_retry_times

No

Integer

Maximum retry times. The maximum value is 100, and the default value is 20.

feature

No

String

Job feature. Type of the Spark image used by a job.

  • basic: indicates that the basic Spark image provided by DLI is used.
  • custom: indicates that the user-defined Spark image is used.
  • ai: indicates that the AI image provided by DLI is used.

spark_version

No

String

Version of the Spark component

  • If the in-use Spark version is 2.3.2, this parameter is not required.
  • If the current Spark version is 2.3.3, this parameter is required when feature is basic or ai. If this parameter is not set, the default Spark version 2.3.2 is used.

image

No

String

Custom image. The format is Organization name/Image name:Image version.

This parameter is valid only when feature is set to custom. You can use this parameter with the feature parameter to specify a user-defined Spark image for job running. For details about how to use custom images, see Data Lake Insight User Guide.

catalog_name

No

String

To access metadata, set this parameter to dli.

Table 3 Resource types

Resource Type

Physical Resource

driverCores

executorCores

driverMemory

executorMemory

numExecutor

A

8 vCPUs, 32-GB memory

2

1

7 GB

4 GB

6

B

16 vCPUs, 64-GB memory

2

2

7 GB

8 GB

7

C

32 vCPUs, 128-GB memory

4

2

15 GB

8 GB

14

Table 4 resources parameters

Parameter

Mandatory

Type

Description

name

No

String

Resource name You can also specify an OBS path, for example, obs://Bucket name/Package name.

type

No

String

Resource type.

Table 5 groups parameters

Parameter

Mandatory

Type

Description

name

No

String

User group name

resources

No

Array of objects

User group resource For details, see Table 4.

Response

Table 6 Response parameters

Parameter

Mandatory

Type

Description

id

No

String

ID of a batch processing job.

appId

No

String

Back-end application ID of a batch processing job.

name

No

String

Batch processing task name. The value contains a maximum of 128 characters.

owner

No

String

Owner of a batch processing job.

proxyUser

No

String

Proxy user (resource tenant) to which a batch processing job belongs.

state

No

String

Status of a batch processing job. For details, see Table 7.

kind

No

String

Type of a batch processing job. Only Spark parameters are supported.

log

No

Array of strings

Last 10 records of the current batch processing job.

sc_type

No

String

Type of a computing resource. If the computing resource type is customized, value CUSTOMIZED is returned.

cluster_name

No

String

Queue where a batch processing job is located.

queue

Yes

String

Queue name. Set this parameter to the name of the created DLI queue.

NOTE:
  • This parameter is compatible with the cluster_name parameter. That is, if cluster_name is used to specify a queue, the queue is still valid.
  • You are advised to use the queue parameter. The queue and cluster_name parameters cannot coexist.

image

No

String

Custom image. The format is Organization name/Image name:Image version.

This parameter is valid only when feature is set to custom. You can use this parameter with the feature parameter to specify a user-defined Spark image for job running. For details about how to use custom images, see Data Lake Insight User Guide.

create_time

No

Long

Time when a batch processing job is created. The timestamp is expressed in milliseconds.

update_time

No

Long

Time when a batch processing job is updated. The timestamp is expressed in milliseconds.

duration

No

Long

Job running duration (unit: millisecond)

Table 7 Batch processing job statuses

Parameter

Type

Description

starting

String

The batch processing job is being started.

running

String

The batch processing job is executing a task.

dead

String

The batch processing job has exited.

success

String

The batch processing job is successfully executed.

recovering

String

The batch processing job is being restored.

Example Request

Create a Spark job. Set the Spark main class of the job to org.apache.spark.examples.SparkPi, specify the program package to batchTest/spark-examples_2.11-2.1.0.luxor.jar, and load the program package whose type is jar and the resource package whose type is files.

{
    "file": "batchTest/spark-examples_2.11-2.1.0.luxor.jar",
    "className": "org.apache.spark.examples.SparkPi",
    "sc_type": "A",
    "jars": ["demo-1.0.0.jar"],
    "files": ["count.txt"],
    "resources":[
                   {"name": "groupTest/testJar.jar", "type": "jar"},
                   {"name": "kafka-clients-0.10.0.0.jar", "type": "jar"}],
    "groups": [
                   {"name": "groupTestJar", "resources": [{"name": "testJar.jar", "type": "jar"}, {"name": "testJar1.jar", "type": "jar"}]}, 
                   {"name": "batchTest", "resources":  [{"name": "luxor.jar", "type": "jar"}]}],
    "queue": " test",
    "name": "TestDemo4",
    "feature": "basic",
    "spark_version": "2.3.2"
}

The batchTest/spark-examples_2.11-2.1.0.luxor.jar file has been uploaded through API involved in Uploading a Package Group (Discarded).

Example Response

{
  "id": "07a3e4e6-9a28-4e92-8d3f-9c538621a166",
  "appId": "",
  "name": "",
  "owner": "test1",
  "proxyUser": "",
  "state": "starting",
  "kind": "",
  "log": [],
  "sc_type": "CUSTOMIZED",
  "cluster_name": "aaa",
  "queue": "aaa",
  "create_time": 1607589874156,
  "update_time": 1607589874156
}

Status Codes

Table 8 describes the status code.

Table 8 Status code

Status Code

Description

200

The job is created successfully.

400

Request error.

500

Internal service error.

Error Codes

If an error occurs when this API is invoked, the system does not return the result similar to the preceding example, but returns the error code and error information. For details, see Error Codes.