Creating a Batch Processing Job

Function

This API is used to create a batch processing job in a queue.

During the Spark job submission process, if the job fails to acquire resources successfully for an extended period, the job status will change to dead after waiting for approximately 3 hours. For details about Spark job statuses, see Table 7.

Authorization

Each account has full permissions to call all APIs, but its IAM users need permission assignments to do so. For specific permission requirements, refer to Permissions Policies and Supported Actions.

URI

URI format
POST /v2.0/{project_id}/batches

Parameter descriptions

**Table 1** URI parameter
Parameter	Mandatory	Type	Description
project_id	Yes	String	Definition Project ID, which is used for resource isolation. For how to obtain a project ID, see Obtaining a Project ID. Example: 48cc2c48765f481480c7db940d6409d1 Constraints None Range The value can contain 1 to 64 characters. Only letters and digits are allowed. Default Value None

Request Parameters

**Table 2** Request parameters
Parameter	Mandatory	Type	Description
file	Yes	String	Definition Name of the package that is of the JAR or pyFile type and has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name. Constraints Spark 3.3.x or later supports only packages in OBS paths. Range None Default Value None
className	Yes	String	Definition Java/Spark main class of the batch processing job Constraints None Range None Default Value None
queue	No	String	Definition Queue name. Set this parameter to the name of the created DLI queue. The queue must be of the general-purpose type. Constraints This parameter is compatible with the cluster_name parameter. That is, if cluster_name is used to specify a queue, the queue is still valid. The queue parameter is recommended. If queue and cluster_name are both set, the value of queue is used. Range None Default Value None
cluster_name	No	String	Definition Queue name. Set this parameter to the created DLI queue name. Constraints You are advised to use the queue parameter. The queue and cluster_name parameters cannot coexist. Range None Default Value None
args	No	Array of strings	Definition Input parameters of the main class, that is, application parameters. Constraints None Range None Default Value None
sc_type	No	String	Definition Compute resource type. Currently, resource types A, B, and C are available. If this parameter is not specified, the minimum configuration (type A) is used. For details about resource types, see Table 3. Constraints None Range None Default Value None
jars	No	Array of strings	Definition Name of the package that is of the JAR type and has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name. Constraints None Range None Default Value None
pyFiles	No	Array of strings	Definition Name of the package that is of the PyFile type and has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name. Constraints None Range None Default Value None
files	No	Array of strings	Definition Name of the package that is of the file type and has been uploaded to the DLI resource management system. You can also specify an OBS path, for example, obs://Bucket name/Package name. Constraints None Range None Default Value None
modules	No	Array of strings	Definition Name of the dependency system resource module. You can check the module name using the Querying Resource Packages in a Group (Deprecated) API. Constraints None Range DLI provides dependencies for executing datasource jobs. The following table lists the dependency modules corresponding to different services. CloudTable/MRS HBase: sys.datasource.hbase CloudTable/MRS OpenTSDB: sys.datasource.opentsdb RDS MySQL: sys.datasource.rds RDS Postgre: preset DWS: preset CSS: sys.datasource.css Default Value None
resources	No	Array of objects	Definition JSON object list, including the name and type of the JSON package that has been uploaded to the queue. For details, see Table 4. Constraints Spark 3.3.x or later does not support this parameter. Configure resource package information in jars, pyFiles, and files. Range None Default Value None
groups	No	Array of objects	Definition JSON object list, including the package group resource. For details about the format, see the request example. If the type of the name in resources is not verified, the package with the name exists in the group. For details, see Table 5. Constraints Spark 3.3.x or later does not support group information configuration. Range None Default Value None
conf	No	Object	Definition Batch configuration item. For details, see Spark Configuration. Constraints None Range None Default Value None
name	No	String	Definition Batch processing task name. The value contains a maximum of 128 characters. Constraints None Range None Default Value None
driverMemory	No	String	Definition Driver memory of the Spark application, for example, 2 GB and 2048 MB. This configuration will replace the default settings in sc_type. When using it, you must include the unit, otherwise it will fail to start. Constraints None Range None Default Value None
driverCores	No	Integer	Definition Number of CPU cores of the Spark application driver. This configuration item replaces the default parameter in sc_type. Constraints None Range None Default Value None
executorMemory	No	String	Definition Executor memory of the Spark application, for example, 2 GB and 2048 MB. This configuration will replace the default settings in sc_type. When using it, you must include the unit, otherwise it will fail to start. Constraints None Range None Default Value None
executorCores	No	Integer	Definition Number of CPU cores of each Executor in the Spark application. This configuration item replaces the default parameter in sc_type. Constraints None Range None Default Value None
numExecutors	No	Integer	Definition Number of Executors in a Spark application. This configuration item replaces the default parameter in sc_type. Constraints None Range None Default Value None
obs_bucket	No	String	Definition OBS bucket for storing the Spark jobs. Set this parameter when you need to save jobs. Constraints None Range None Default Value None
auto_recovery	No	Boolean	Definition Whether to enable the retry function. If enabled, Spark jobs will be automatically retried after an exception occurs. The default value is false. Constraints None Range None Default Value false
max_retry_times	No	Integer	Definition Maximum retry times. The maximum value is 100, and the default value is 20. Constraints None Range None Default Value 20
feature	No	String	Definition Job feature. Type of the Spark image used by a job. Constraints None Range custom: indicates that the user-defined Spark image is used. Default Value None
spark_version	No	String	Definition Version of the Spark component Constraints None Range If the in-use Spark version is 2.3.2, this parameter is not required. Default Value None
execution_agency_urn	No	String	Definition Name of the agency authorized to DLI. This parameter is configurable in Spark 3.3.1. Constraints None Range None Default Value None
image	No	String	Definition Custom image. The format is Organization name/Image name:Image version. Constraints This parameter is valid only when feature is set to custom. You can use this parameter with the feature parameter to specify a user-defined Spark image for job running. For details about how to use custom images, see Data Lake Insight User Guide. Range None Default Value None
catalog_name	No	String	Definition To access metadata, set this parameter to dli. Constraints None Range None Default Value None

**Table 3** Resource types
Resource Type	Physical Resource	driverCores	executorCores	driverMemory	executorMemory	numExecutor
A	8 vCPUs, 32 GB memory	2	1	7 GB	4 GB	6
B	16 vCPUs, 64 GB memory	2	2	7 GB	8 GB	7
C	32 vCPUs, 128 GB memory	4	2	15 GB	8 GB	14

**Table 4** resources parameters
Parameter	Mandatory	Type	Description
name	No	String	Definition Resource name You can also specify an OBS path, for example, obs://Bucket name/Package name. Constraints None Range None Default Value None
type	No	String	Definition Resource type. Constraints None Range None Default Value None

**Table 5** groups parameters
Parameter	Mandatory	Type	Description
name	No	String	Definition User group name Constraints None Range None Default Value None
resources	No	Array of objects	Definition User group resource For details, see Table 4. Constraints None Range None Default Value None

Response Parameters

**Table 6** Response parameters
Parameter	Mandatory	Type	Description
id	No	String	Definition ID of a batch processing job. Range None
appId	No	String	Definition Back-end application ID of a batch processing job. Range None
name	No	String	Definition Batch processing task name. The value contains a maximum of 128 characters. Range None
owner	No	String	Definition Owner of a batch processing job. Range None
proxyUser	No	String	Definition Proxy user (resource tenant) to which a batch processing job belongs. Range None
state	No	String	Definition Status of a batch processing job. For details, see Table 7. Range None
kind	No	String	Definition Type of a batch processing job. Only Spark parameters are supported. Range None
log	No	Array of strings	Definition Last 10 records of the current batch processing job. Range None
sc_type	No	String	Definition Compute resource type. Currently, the value can be A, B, or C. If the compute resource type is customized, value CUSTOMIZED is returned. Range A: physical resources, with 8 vCPUs and 32 GB of memory driverCores: 2; executorCores: 1; driverMemory: 7G; executorMemory: 4G; numExecutor: 6. B: physical resources, with 16 vCPUs and 64 GB of memory driverCores: 2; executorCores: 2; driverMemory: 7G; executorMemory: 8G; numExecutor: 7. C: physical resources, with 32 vCPUs and 128 GB of memory driverCores: 4; executorCores: 2; driverMemory: 15G; executorMemory: 8G; numExecutor: 14.
cluster_name	No	String	Definition Queue where a batch processing job is located. Range None
queue	Yes	String	Definition Queue name. Set this parameter to the name of the created DLI queue. This parameter is compatible with the cluster_name parameter. That is, if cluster_name is used to specify a queue, the queue is still valid. You are advised to use the queue parameter. The queue and cluster_name parameters cannot coexist. Range None
image	No	String	Definition Custom image. The format is Organization name/Image name:Image version. Range This parameter is valid only when feature is set to custom. You can use this parameter with the feature parameter to specify a user-defined Spark image for job running. For details about how to use custom images, see Data Lake Insight User Guide.
create_time	No	Long	Definition Time when a batch processing job is created. The timestamp is in milliseconds. Range None
update_time	No	Long	Definition Time when a batch processing job is updated. The timestamp is in milliseconds. Range None
duration	No	Long	Definition Job running duration, in milliseconds. Range None

**Table 7** Batch processing job statuses
Parameter	Type	Description
starting	String	The batch processing job is being started.
running	String	The batch processing job is executing a task.
dead	String	The batch processing job has exited.
success	String	The batch processing job is successfully executed.
recovering	String	The batch processing job is being restored.

Example Request

Create a Spark job. Set the Spark main class of the job to org.apache.spark.examples.SparkPi, specify the program package to batchTest/spark-examples_2.11-2.1.0.luxor.jar, and load the program package whose type is jar and the resource package whose type is files.

{
    "file": "batchTest/spark-examples_2.11-2.1.0.luxor.jar",
    "className": "org.apache.spark.examples.SparkPi",
    "sc_type": "A",
    "jars": ["demo-1.0.0.jar"],
    "files": ["count.txt"],
    "resources":[
                   {"name": "groupTest/testJar.jar", "type": "jar"},
                   {"name": "kafka-clients-0.10.0.0.jar", "type": "jar"}],
    "groups": [
                   {"name": "groupTestJar", "resources": [{"name": "testJar.jar", "type": "jar"}, {"name": "testJar1.jar", "type": "jar"}]}, 
                   {"name": "batchTest", "resources":  [{"name": "luxor.jar", "type": "jar"}]}],
    "queue": " test",
    "name": "TestDemo4",
    "feature": "basic",
    "execution_agency_urn": "myAgencyName",
    "spark_version": "2.3.2"
}

The batchTest/spark-examples_2.11-2.1.0.luxor.jar file has been uploaded through API involved in Uploading a Package Group (Deprecated).

Example Response

{
  "id": "07a3e4e6-9a28-4e92-8d3f-9c538621a166",
  "appId": "",
  "name": "",
  "owner": "test1",
  "proxyUser": "",
  "state": "starting",
  "kind": "",
  "log": [],
  "sc_type": "CUSTOMIZED",
  "cluster_name": "aaa",
  "queue": "aaa",
  "create_time": 1607589874156,
  "update_time": 1607589874156
}

Status Codes

Table 8 describes status codes.

**Table 8** Status codes
Status Code	Description
200	The job is created successfully.
400	Request error.
500	Internal server error.

Error Codes

If an error occurs when this API is called, the system does not return the result similar to the preceding example, but returns an error code and error message. For details, see Error Codes.

Parent topic: Spark Job-related APIs

Previous topic: Spark Job-related APIs

Next topic: Listing Batch Processing Jobs