Creating a Batch Processing Job

Function

This API is used to create a batch processing job in a queue.

URI

URI format
POST /v2.0/{project_id}/batches

Parameter description

**Table 1** URI parameter
Parameter	Mandatory	Type	Description
project_id	Yes	String	Project ID, which is used for resource isolation. For details about how to obtain its value, see Obtaining a Project ID.

Request Parameters

**Table 2** Request parameters
Parameter	Mandatory	Type	Description
file	Yes	String	Name of the package that is of the JAR or pyFile type and has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name.
className	Yes	String	Java/Spark main class of the batch processing job.
queue	No	String	Queue name. Set this parameter to the name of the created DLI queue. The queue must be of the general-purpose type. NOTE: This parameter is compatible with the cluster_name parameter. That is, if cluster_name is used to specify a queue, the queue is still valid. You are advised to use the queue parameter. The queue and cluster_name parameters cannot coexist.
cluster_name	No	String	Queue name. Set this parameter to the created DLI queue name. NOTE: You are advised to use the queue parameter. The queue and cluster_name parameters cannot coexist.
args	No	Array of Strings	Input parameters of the main class, that is, application parameters.
sc_type	No	String	Compute resource type. Currently, resource types A, B, and C are available. If this parameter is not specified, the minimum configuration (type A) is used. For details about resource types, see Table 3.
jars	No	Array of Strings	Name of the package that is of the JAR type and has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name.
pyFiles	No	Array of Strings	Name of the package that is of the PyFile type and has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name.
files	No	Array of Strings	Name of the package that is of the file type and has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name.
modules	No	Array of Strings	Name of the dependent system resource module. You can view the module name using the API related to Querying Resource Packages in a Group (Discarded). DLI provides dependencies for executing datasource jobs. The following table lists the dependency modules corresponding to different services. CloudTable/MRS HBase: sys.datasource.hbase CloudTable/MRS OpenTSDB: sys.datasource.opentsdb RDS MySQL: sys.datasource.rds RDS Postgre: preset DWS: preset CSS: sys.datasource.css
resources	No	Array of objects	JSON object list, including the name and type of the JSON package that has been uploaded to the queue. For details, see Table 4.
groups	No	Array of objects	JSON object list, including the package group resource. For details about the format, see the request example. If the type of the name in resources is not verified, the package with the name exists in the group. For details, see Table 5.
conf	No	Object	Batch configuration item. For details, see Spark Configuration.
name	No	String	Batch processing task name. The value contains a maximum of 128 characters.
driverMemory	No	String	Driver memory of the Spark application, for example, 2 GB and 2048 MB. This configuration item replaces the default parameter in sc_type. The unit must be provided. Otherwise, the startup fails.
driverCores	No	Integer	Number of CPU cores of the Spark application driver. This configuration item replaces the default parameter in sc_type.
executorMemory	No	String	Executor memory of the Spark application, for example, 2 GB and 2048 MB. This configuration item replaces the default parameter in sc_type. The unit must be provided. Otherwise, the startup fails.
executorCores	No	Integer	Number of CPU cores of each Executor in the Spark application. This configuration item replaces the default parameter in sc_type.
numExecutors	No	Integer	Number of Executors in a Spark application. This configuration item replaces the default parameter in sc_type.
obs_bucket	No	String	OBS bucket for storing the Spark jobs. Set this parameter when you need to save jobs.
auto_recovery	No	Boolean	Whether to enable the retry function. If enabled, Spark jobs will be automatically retried after an exception occurs. The default value is false.
max_retry_times	No	Integer	Maximum retry times. The maximum value is 100, and the default value is 20.
feature	No	String	Job feature. Type of the Spark image used by a job. custom: indicates that the user-defined Spark image is used.
spark_version	No	String	Version of the Spark component If the in-use Spark version is 2.3.2, this parameter is not required.
image	No	String	Custom image. The format is Organization name/Image name:Image version. This parameter is valid only when feature is set to custom. You can use this parameter with the feature parameter to specify a user-defined Spark image for job running. For details about how to use custom images, see Data Lake Insight User Guide.
catalog_name	No	String	To access metadata, set this parameter to dli.

**Table 3** Resource types
Resource Type	Physical Resource	driverCores	executorCores	driverMemory	executorMemory	numExecutor
A	8 vCPUs, 32-GB memory	2	1	7 GB	4 GB	6
B	16 vCPUs, 64-GB memory	2	2	7 GB	8 GB	7
C	32 vCPUs, 128-GB memory	4	2	15 GB	8 GB	14

**Table 4** **resources** parameters
Parameter	Mandatory	Type	Description
name	No	String	Resource name You can also specify an OBS path, for example, obs://Bucket name/Package name.
type	No	String	Resource type.

**Table 5** **groups** parameters
Parameter	Mandatory	Type	Description
name	No	String	User group name
resources	No	Array of objects	User group resource For details, see Table 4.

Response Parameters

**Table 6** Response parameters
Parameter	Mandatory	Type	Description
id	No	String	ID of a batch processing job.
appId	No	String	Back-end application ID of a batch processing job.
name	No	String	Batch processing task name. The value contains a maximum of 128 characters.
owner	No	String	Owner of a batch processing job.
proxyUser	No	String	Proxy user (resource tenant) to which a batch processing job belongs.
state	No	String	Status of a batch processing job. For details, see Table 7.
kind	No	String	Type of a batch processing job. Only Spark parameters are supported.
log	No	Array of strings	Last 10 records of the current batch processing job.
sc_type	No	String	Type of a computing resource. If the computing resource type is customized, value CUSTOMIZED is returned.
cluster_name	No	String	Queue where a batch processing job is located.
queue	Yes	String	Queue name. Set this parameter to the name of the created DLI queue. NOTE: This parameter is compatible with the cluster_name parameter. That is, if cluster_name is used to specify a queue, the queue is still valid. You are advised to use the queue parameter. The queue and cluster_name parameters cannot coexist.
image	No	String	Custom image. The format is Organization name/Image name:Image version. This parameter is valid only when feature is set to custom. You can use this parameter with the feature parameter to specify a user-defined Spark image for job running. For details about how to use custom images, see Data Lake Insight User Guide.
create_time	No	Long	Time when a batch processing job is created. The timestamp is expressed in milliseconds.
update_time	No	Long	Time when a batch processing job is updated. The timestamp is expressed in milliseconds.
duration	No	Long	Job running duration (unit: millisecond)

**Table 7** Batch processing job statuses
Parameter	Type	Description
starting	String	The batch processing job is being started.
running	String	The batch processing job is executing a task.
dead	String	The batch processing job has exited.
success	String	The batch processing job is successfully executed.
recovering	String	The batch processing job is being restored.

Example Request

Create a Spark job. Set the Spark main class of the job to org.apache.spark.examples.SparkPi, specify the program package to batchTest/spark-examples_2.11-2.1.0.luxor.jar, and load the program package whose type is jar and the resource package whose type is files.

{
    "file": "batchTest/spark-examples_2.11-2.1.0.luxor.jar",
    "className": "org.apache.spark.examples.SparkPi",
    "sc_type": "A",
    "jars": ["demo-1.0.0.jar"],
    "files": ["count.txt"],
    "resources":[
                   {"name": "groupTest/testJar.jar", "type": "jar"},
                   {"name": "kafka-clients-0.10.0.0.jar", "type": "jar"}],
    "groups": [
                   {"name": "groupTestJar", "resources": [{"name": "testJar.jar", "type": "jar"}, {"name": "testJar1.jar", "type": "jar"}]}, 
                   {"name": "batchTest", "resources":  [{"name": "luxor.jar", "type": "jar"}]}],
    "queue": " test",
    "name": "TestDemo4",
    "feature": "basic",
    "spark_version": "2.3.2"
}

The batchTest/spark-examples_2.11-2.1.0.luxor.jar file has been uploaded through API involved in Uploading a Package Group (Discarded).

Example Response

{
  "id": "07a3e4e6-9a28-4e92-8d3f-9c538621a166",
  "appId": "",
  "name": "",
  "owner": "test1",
  "proxyUser": "",
  "state": "starting",
  "kind": "",
  "log": [],
  "sc_type": "CUSTOMIZED",
  "cluster_name": "aaa",
  "queue": "aaa",
  "create_time": 1607589874156,
  "update_time": 1607589874156
}

Status Codes

Table 8 describes the status code.

**Table 8** Status code
Status Code	Description
200	The job is created successfully.
400	Request error.
500	Internal service error.

Error Codes

If an error occurs when this API is invoked, the system does not return the result similar to the preceding example, but returns the error code and error information. For details, see Error Codes.

Parent topic: Spark Job-related APIs

Previous topic: Spark Job-related APIs

Next topic: Listing Batch Processing Jobs