Creating and Submitting a Spark Job

Scenario

This section describes how to create and submit Spark jobs using APIs.

Notes and Constraints

It takes 6 to 10 minutes to start a job using a new queue for the first time.

Involved APIs

Creating an Elastic Resource Pool: Create an elastic resource pool.
Creating a Queue: Create queues within the elastic resource pool.
Uploading a Package Group (Deprecated): Upload the resource package required by the Spark job.
Querying Resource Packages in a Group (Deprecated): Check whether the uploaded resource package is correct.
Creating a Batch Processing Job: Create and submit a Spark batch processing job.
Querying a Batch Processing Job Status: View the status of a batch processing job.
Querying Batch Job Logs (Deprecated): View batch processing job logs.

Procedure

Create an elastic resource pool named elastic_pool_dli.
- API
  URI format: POST /v3/{project_id}/elastic-resource-pools
  - Obtain the value of {project_id} by referring to Obtaining a Project ID.
  - For details about the request parameters, see Creating an Elastic Resource Pool.
- Example request
  - Description: Create an elastic resource pool named elastic_pool_dli in the project whose ID is 48cc2c48765f481480c7db940d6409d1.
  - Example URL: POST https://{endpoint}/v3/48cc2c48765f481480c7db940d6409d1/elastic-resource-pools
  - Body:
```
{
  "elastic_resource_pool_name" : "elastic_pool_dli",
  "description" : "test",
  "cidr_in_vpc" : "172.16.0.0/14",
  "charging_mode" : "1",
  "max_cu" : 64,
  "min_cu" : 64
}
```
- Example response
```
{
  "is_success": true,
  "message": ""
}
```
Create a queue named queue1 in the elastic resource pool.
- API
  URI format: POST /v1.0/{project_id}/queues
  - Obtain the value of {project_id} by referring to Obtaining a Project ID.
  - For details about the request parameters, see Creating a Queue.
- Example request
  - Description: Create an elastic resource pool named queue1 in the project whose ID is 48cc2c48765f481480c7db940d6409d1.
  - Example URL: POST https://{endpoint}/v1.0/48cc2c48765f481480c7db940d6409d1/queues
  - Body:
```
{
    "queue_name": "queue1",
    "queue_type": "sql",
    "description": "test",
    "cu_count": 16,
    "enterprise_project_id": "elastic_pool_dli"
}
```
- Example response
```
{
  "is_success": true,
  "message": ""
}
```

Upload a package group.

API
URI format: POST /v2.0/{project_id}/resources
- Obtain the value of {project_id} from Obtaining a Project ID.
- For details about the request parameters, see Uploading a Package Group (Deprecated).
Example request
- Description: Upload resources in the GATK group to the project whose ID is 48cc2c48765f481480c7db940d6409d1.
- Example URL: POST https://{endpoint}/v2.0/48cc2c48765f481480c7db940d6409d1/resources
- Body:
```
{
    "paths": [
        "https://test.obs.xxx.com/txr_test/jars/spark-sdv-app.jar"
    ],
    "kind": "jar",
    "group": "gatk",
    "is_async":"true"
}
```

Example response

{
    "group_name": "gatk",
    "status": "READY",
    "resources": [
        "spark-sdv-app.jar",
        "wordcount",
        "wordcount.py"
    ],
    "details": [
        {
            "create_time": 0,
            "update_time": 0,
            "resource_type": "jar",
            "resource_name": "spark-sdv-app.jar",
            "status": "READY",
            "underlying_name": "987e208d-d46e-4475-a8c0-a62f0275750b_spark-sdv-app.jar"
        },
        {
            "create_time": 0,
            "update_time": 0,
            "resource_type": "jar",
            "resource_name": "wordcount",
            "status": "READY",
            "underlying_name": "987e208d-d46e-4475-a8c0-a62f0275750b_wordcount"
        },
        {
            "create_time": 0,
            "update_time": 0,
            "resource_type": "jar",
            "resource_name": "wordcount.py",
            "status": "READY",
            "underlying_name": "987e208d-d46e-4475-a8c0-a62f0275750b_wordcount.py"
        }
    ],
    "create_time": 1551334579654,
    "update_time": 1551345369070
}

View resource packages in a group.
- API
  URI format: GET /v2.0/{project_id}/resources/{resource_name}
  - Obtain the value of {project_id} from Obtaining a Project ID.
  - For details about the query parameters, see Creating a Table (Deprecated).
- Example request
  - Description: Query the resource package named luxor-router-1.1.1.jar in the GATK group under the project whose ID is 48cc2c48765f481480c7db940d6409d1.
  - Example URL: GET https://{endpoint}/v2.0/48cc2c48765f481480c7db940d6409d1/resources/luxor-router-1.1.1.jar?group=gatk
  - Body:
```
{}
```
- Example response
```
{
    "create_time": 1522055409139,
    "update_time": 1522228350501,
    "resource_type": "jar",
    "resource_name": "luxor-router-1.1.1.jar",
    "status": "uploading",
    "underlying_name": "7885d26e-c532-40f3-a755-c82c442f19b8_luxor-router-1.1.1.jar",
    "owner": "****"
}
```

Create and submit a Spark batch processing job.

API
URI format: POST /v2.0/{project_id}/batches
- Obtain the value of {project_id} from Obtaining a Project ID.
- For details about the request parameters, see Creating a Batch Processing Job.

Example request

Description: In the 48cc2c48765f481480c7db940d6409d1 project, create a batch processing job named TestDemo4 in queue1.
Example URL: POST https://{endpoint}/v2.0/48cc2c48765f481480c7db940d6409d1/batches

Body:

{
  "sc_type": "A",
  "jars": [
   
"spark-examples_2.11-2.1.0.luxor.jar"
  ],
  "driverMemory": "1G",
  "driverCores": 1,
  "executorMemory": "1G",
  "executorCores": 1,
  "numExecutors": 1,
  "queue": "queuel",
  "file":
"spark-examples_2.11-2.1.0.luxor.jar",
  "className":
"org.apache.spark.examples.SparkPi",
  "minRecoveryDelayTime": 10000,
  "maxRetryTimes": 20
}

Example response

{
  "id": "07a3e4e6-9a28-4e92-8d3f-9c538621a166",
  "appId": "",
  "name": "",
  "owner": "test1",
  "proxyUser": "",
  "state": "starting",
  "kind": "",
  "log": [],
  "sc_type": "CUSTOMIZED",
  "cluster_name": "aaa",
  "queue": "queue1",
  "create_time": 1607589874156,
  "update_time": 1607589874156
}

Query a batch job status.
- API
  URI format: GET /v2.0/{project_id}/batches/{batch_id}/state
  - Obtain the value of {project_id} from Obtaining a Project ID.
  - For details about the query parameters, see Querying a Batch Processing Job Status.
- Example request
  - Description: Query the status of the batch processing job whose ID is 0a324461-d9d9-45da-a52a-3b3c7a3d809e in the project whose ID is 48cc2c48765f481480c7db940d6409d1.
  - Example URL: GET https://{endpoint}/v2.0/48cc2c48765f481480c7db940d6409d1/batches/0a324461-d9d9-45da-a52a-3b3c7a3d809e/state
  - Body:
```
{}
```
- Example response
```
{
   "id":"0a324461-d9d9-45da-a52a-3b3c7a3d809e",
   "state":"Success"
}
```
Query batch job logs.
- API
  URI format: GET /v2.0/{project_id}/batches/{batch_id}/log
  - Obtain the value of {project_id} from Obtaining a Project ID.
  - For details about the query parameters, see Querying Batch Job Logs (Deprecated).
- Example request
  - Description: Query the background logs of the batch processing job 0a324461-d9d9-45da-a52a-3b3c7a3d809e in the 48cc2c48765f481480c7db940d6409d1 project.
  - Example URL: GET https://{endpoint}/v2.0/48cc2c48765f481480c7db940d6409d1/batches/0a324461-d9d9-45da-a52a-3b3c7a3d809e/log
  - Body:
```
{}
```
- Example response
```
{
    "id": "0a324461-d9d9-45da-a52a-3b3c7a3d809e",
    "from": 0,
    "total": 3,
    "log": [
           "Detailed information about job logs"
    ]
}
```