Creating and Submitting a Spark Job
Scenario Description
This section describes how to create and submit Spark jobs using APIs.
Constraints
- It takes 6 to 10 minutes to start a job using a new queue for the first time.
Involved APIs
- Creating a Queue: Create a queue.
- Uploading a Package Group (Discarded): Upload the resource package required by the Spark job.
- Querying Resource Packages in a Group (Discarded): Check whether the uploaded resource package is correct.
- Creating a Batch Processing Job: Create and submit a Spark batch processing job.
- Querying a Batch Job Status: View the status of a batch processing job.
- Querying Batch Job Logs (Discarded): View batch processing job logs.
Procedure
- Create a common queue. For details, see Creating a Queue.
- Upload a package group.
- API
URI format: POST /v2.0/{project_id}/resources
- Obtain the value of {project_id} from Obtaining a Project ID.
- For details about the request parameters, see Uploading a Package Group (Discarded).
- Request example
- Description: Upload resources in the GATK group to the project whose ID is 48cc2c48765f481480c7db940d6409d1.
- Example URL: POST https://{endpoint}/v2.0/48cc2c48765f481480c7db940d6409d1/resources
- Body:
{ "paths": [ "https://test.obs.xxx.com/txr_test/jars/spark-sdv-app.jar" ], "kind": "jar", "group": "gatk", "is_async":"true" }
- Example response
{ "group_name": "gatk", "status": "READY", "resources": [ "spark-sdv-app.jar", "wordcount", "wordcount.py" ], "details": [ { "create_time": 0, "update_time": 0, "resource_type": "jar", "resource_name": "spark-sdv-app.jar", "status": "READY", "underlying_name": "987e208d-d46e-4475-a8c0-a62f0275750b_spark-sdv-app.jar" }, { "create_time": 0, "update_time": 0, "resource_type": "jar", "resource_name": "wordcount", "status": "READY", "underlying_name": "987e208d-d46e-4475-a8c0-a62f0275750b_wordcount" }, { "create_time": 0, "update_time": 0, "resource_type": "jar", "resource_name": "wordcount.py", "status": "READY", "underlying_name": "987e208d-d46e-4475-a8c0-a62f0275750b_wordcount.py" } ], "create_time": 1551334579654, "update_time": 1551345369070 }
- API
- View resource packages in a group.
- API
URI format: GET /v2.0/{project_id}/resources/{resource_name}
- Obtain the value of {project_id} from Obtaining a Project ID.
- For details about the query parameters, see Creating a Table (Discarded).
- Request example
- Description: Query the resource package named luxor-router-1.1.1.jar in the GATK group under the project whose ID is 48cc2c48765f481480c7db940d6409d1.
- Example URL: GET https://{endpoint}/v2.0/48cc2c48765f481480c7db940d6409d1/resources/luxor-router-1.1.1.jar?group=gatk
- Body:
{}
- Example response
{ "create_time": 1522055409139, "update_time": 1522228350501, "resource_type": "jar", "resource_name": "luxor-router-1.1.1.jar", "status": "uploading", "underlying_name": "7885d26e-c532-40f3-a755-c82c442f19b8_luxor-router-1.1.1.jar", "owner": "****" }
- API
- Create and submit a Spark batch processing job.
- API
URI format: POST /v2.0/{project_id}/batches
- Obtain the value of {project_id} from Obtaining a Project ID.
- For details about the request parameters, see Creating a Batch Processing Job.
- Request example
- Description: In the 48cc2c48765f481480c7db940d6409d1 project, create a batch processing job named TestDemo4 in queue1.
- Example URL: POST https://{endpoint}/v2.0/48cc2c48765f481480c7db940d6409d1/batches
- Body:
{ "sc_type": "A", "jars": [ "spark-examples_2.11-2.1.0.luxor.jar" ], "driverMemory": "1G", "driverCores": 1, "executorMemory": "1G", "executorCores": 1, "numExecutors": 1, "queue": "cce_general", "file": "spark-examples_2.11-2.1.0.luxor.jar", "className": "org.apache.spark.examples.SparkPi", "minRecoveryDelayTime": 10000, "maxRetryTimes": 20 }
- Example response
{ "id": "07a3e4e6-9a28-4e92-8d3f-9c538621a166", "appId": "", "name": "", "owner": "test1", "proxyUser": "", "state": "starting", "kind": "", "log": [], "sc_type": "CUSTOMIZED", "cluster_name": "aaa", "queue": "aaa", "create_time": 1607589874156, "update_time": 1607589874156 }
- API
- Query a batch job status.
- API
URI format: GET /v2.0/{project_id}/batches/{batch_id}/state
- Obtain the value of {project_id} from Obtaining a Project ID.
- For details about the query parameters, see Querying a Batch Job Status.
- Request example
- Description: Query the status of the batch processing job whose ID is 0a324461-d9d9-45da-a52a-3b3c7a3d809e in the project whose ID is 48cc2c48765f481480c7db940d6409d1.
- Example URL: GET https://{endpoint}/v2.0/48cc2c48765f481480c7db940d6409d1/batches/0a324461-d9d9-45da-a52a-3b3c7a3d809e/state
- Body:
{}
- Example response
{ "id":"0a324461-d9d9-45da-a52a-3b3c7a3d809e", "state":"Success" }
- API
- Query batch job logs.
- API
URI format: GET /v2.0/{project_id}/batches/{batch_id}/log
- Obtain the value of {project_id} from Obtaining a Project ID.
- For details about the query parameters, see Querying Batch Job Logs (Discarded).
- Request example
- Description: Query the background logs of the batch processing job 0a324461-d9d9-45da-a52a-3b3c7a3d809e in the 48cc2c48765f481480c7db940d6409d1 project.
- Example URL: GET https://{endpoint}/v2.0/48cc2c48765f481480c7db940d6409d1/batches/0a324461-d9d9-45da-a52a-3b3c7a3d809e/log
- Body:
{}
- Example response
{ "id": "0a324461-d9d9-45da-a52a-3b3c7a3d809e", "from": 0, "total": 3, "log": [ "Detailed information about job logs" ] }
- API
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.