Updated on 2022-07-04 GMT+08:00

Submitting a Spark Job

Use DLI to submit Spark jobs for real-time computing. The general procedure is as follows:

Step 1: Logging in to the Cloud

Step 2: Uploading Data to OBS

Step 3: Logging In to the DLI Management Console

Step 4: Creating a Queue

Step 5: Creating a Package

Step 6: Submitting a Spark Job

Step 1: Logging in to the Cloud

To use DLI, you need to log in to the cloud.

  1. Open the DLI home page.
  2. On the login page, enter the Username and Password, and click Login.

Step 2: Uploading Data to OBS

Before submitting Spark jobs, upload data files to OBS.

  1. In the services displayed, click Object Storage Service in Storage.
  2. The OBS console page is displayed.
  3. Create a bucket. The bucket name is globally unique. In this example, assume that the bucket name is obs1.
    1. Click Create Bucket.
    2. On the Create Bucket page that is displayed, specify Bucket Name.
    3. Click Create Now.
  4. Click obs1 to switch to the Summary page.
  5. From the left navigation tree, click Object. Click Upload Object. In the displayed dialog box, drag files or folders to upload or add file to the file upload box, for example, spark-examples.jar. Then, click Upload.

    After the file is uploaded successfully, the file path to be analyzed is obs://obs1/spark-examples.jar.

    For more information about OBS, see the Object Storage Service Console Operation Guide.

    You are advised to use an OBS tool, such as OBS Browser+ or obsutil, to upload large files because OBS Console has restrictions on the file size and quantity. OBS Browser+ is a graphical tool that provides complete functions for managing your buckets and objects in OBS. You are advised to use this tool to create buckets or upload objects. obsutil is a command line tool for accessing and managing OBS resources. If you are familiar with command line interface (CLI), obsutil is recommended as an ideal tool for batch processing and automated tasks. For details about how to upload files to OBS, see the OBS Tool Guide.

Step 3: Logging In to the DLI Management Console

To submit Spark jobs, you need to enter the Spark job creation page first.

  1. In the service list displayed, click Data Lake Insight in Enterprise Intelligence.
  2. The DLI management console page is displayed. If you log in to the DLI management console for the first time, you need to be authorized to access OBS.

Step 4: Creating a Queue

If it is your first time submitting a Spark job, create a queue first. For example, create a queue named test. For details about how to create a queue, see Creating a Queue.

Step 5: Creating a Package

Before submitting a Spark job, you need to create a package, for example, spark-examples.jar. For details, see Creating a Package.

Step 6: Submitting a Spark Job

  1. On the DLI management console, choose Job Management > Spark Jobs in the navigation pane on the left. The page for creating a Spark job is displayed.
  2. On the Spark job editing page, set related parameters. For details, see GUI Description.
  3. Click Execute in the upper right corner of the Spark job editing window, read and agree to the privacy agreement, and click OK. Submit the job. A message is displayed, indicating that the job is successfully submitted.
  4. (Optional) Switch to the Job Management > Spark Jobs page to view the status and logs of the submitted Spark job.

    When you click Execute on the DLI management console for the first time, you need to read the privacy agreement. Once agreed to the agreement, you will not receive any privacy agreement messages for subsequent operations.