Creating a User-Defined Spark Job

This section describes how to create a user-defined Spark job. You can perform secondary development based on Spark APIs, build your own JAR file, and submit the JAR file to CS clusters. CS is fully compatible with open-source community APIs. To create a user-defined Spark job, you need to compile and build application JAR files. You must have a certain understanding of Spark secondary development and have high requirements related to stream computing complexity.

Prerequisites

  • You have complied the secondary development application code into a JAR file and stored the JAR file on your local PC or uploaded it to an OBS bucket.
  • The Spark dependency packages have been integrated into the CS server and system hardening has been performed based on the open-source community version. You need to exclude related Spark dependencies when building an application JAR file. To achieve this, use Maven or SBT to set scope to provided.

Procedure

  1. You can create a user-defined Spark job on either of the following two pages: Overview and Job Management.

    • Overview
      1. In the navigation tree on the left pane of the CS management console, click Overview to switch to the Overview page.
        Figure 1 Creating a job on the Overview page
      2. Click Create Job to switch to the Create Job dialog box.
    • Job Management
      1. In the navigation tree on the left pane of the CS management console, choose Job Management to switch to the Job Management page.
        Figure 2 Creating a job on the Job Management page
      2. On the Job Management page, click Create Job to switch to the Create Job dialog box.

  2. Specify job parameters.

    Figure 3 Creating a user-defined Spark job
    Table 1 Parameters related to job creation

    Parameter

    Description

    Type

    Select Spark Streaming JAR Job.

    Name

    Name of a job. Enter 1 to 57 characters. Only letters, digits, hyphens (-), and underscores (_) are allowed.

    NOTE:

    The job name must be globally unique.

    Description

    Description of a job. It can be up to 512 bytes long.

  3. For Enterprise Project, select an enterprise project that you created on the Enterprise Management console.

    For details about how to create an enterprise project on the Enterprise Management console, see Creating an Enterprise Project in the Enterprise Management User Guide.

    The system also has a built-in enterprise project, default. If you do not select an enterprise project for the job, the default project is used instead.

    During job creation, if the job is successfully bound to an enterprise project, the job has been created. If the binding fails, the system sends an alarm and the job fails to be created.

    When you delete a job, the association between the job and its enterprise project is automatically deleted as well.

  4. (Optional) Add tags for the job. In this step, configure the parameters in the following table as required. The tags are optional. If you do not need tags, skip this step.

    Table 2 Tag parameters

    Parameter

    Description

    Tag key

    You can perform the following operations:

    • Click the text box and select a predefined tag key from the drop-down list.
      NOTE:

      To add a predefined tag, you need to create one on TMS and select it from the Tag key drop-down list. You can click View Predefined Tag to enter the Predefined Tag page of TMS. Then, click Create Tag to create a predefined tag. For details, see section Creating Predefined Tags in the Tag Management Service User Guide.

    • Enter a tag key in the text box.
      NOTE:

      A tag key contains a maximum of 36 characters. The first and last characters cannot be spaces. The following characters are not allowed: =*,<>\|/

    Tag value

    You can perform the following operations:

    • Click the text box and select a predefined tag value from the drop-down list.
    • Enter a tag value in the text box.
      NOTE:

      A tag value contains a maximum of 43 characters. The first and last characters cannot be spaces. The following characters are not allowed: =*,<>\|/

    • A maximum of 10 tags can be added.
    • Only one tag value can be added to a tag key.
    • The key name must be unique in the same resource.

  5. Click OK to enter the Edit page.
  6. Upload the JAR file.

    Figure 4 Uploading the JAR file
    Table 3 Parameter description

    Name

    Description

    Upload Mode

    You can use either of the following methods to upload the JAR file:

    • Local upload: Upload the JAR file saved in your local PC to the CS server.
      NOTE:

      To upload a JAR file larger than 8 MB, upload the JAR file to OBS and then reference it from OBS.

    • OBS: Select a file from OBS as the data source and upload the file to the OBS bucket so that CS can obtain data from OBS.
      NOTE:

      With this method, you need to create a bucket on the OBS management console and upload the customized JAR file to the bucket before the uploading.

    Uploaded JAR File

    Name of the uploaded JAR file.

    Main Class

    Name of the main class in the JAR file to be uploaded, for example, KafkaMessageStreaming. If you select Default for Main Class, the entry point is specified in the Manifest file in the JAR file. If you select Manually assign for Main Class, you need to specify Class Name. In the text box next to Class Arguments, enter the class arguments that are space-separated.

    NOTE:

    If you specify a main class in a file, the value of this parameter must contain the file path. For example, packagePath.KafkaMessageStreaming.

    Arguments

    List of parameters related to the main class. Every two parameters are separated by a space.

    Configuration File

    • You can select the spark-defaults.conf file or user-defined configuration files. The user-defined configuration files are transferred to the driver or executor through --files.
    • If the core-site.xml or hdfs-site.xml file exists, rename the files to prevent conflicts with corresponding files in the CS cluster.
    • To upload multiple configuration files, compress them into a ZIP package and then upload the package.

    There are two methods to upload the configuration files:

    • Local upload: Upload the file saved in your local PC to the CS server.
    • OBS: Select a file from OBS as the data source and upload the file to the OBS bucket. CS then obtains data from OBS.

  7. Click Configure Parameters on the left to configure job parameters.

    Figure 5 Performing basic configurations of the user-defined Spark job
    Table 4 Parameter description

    Name

    Description

    SPUs

    An SPU consists of 1 vCPU compute and 4 GB memory.

    This is the total number of SPUs configured for a user-defined Spark job, including the SPUs configured for the driver node and all executor nodes.

    Driver SPUs

    Number of SPUs used for each driver node. By default, one SPU is configured. You can select one to four SPUs.

    Executors

    Number of Executor nodes. The value ranges from 1 to 100. The default value is 1.

    SPUs per Executor

    Number of SPUs used for each Executor node. By default, one SPU is configured. You can select one to four SPUs for Job Manager.

    Save Job Log

    Whether to save job logs

    To enable this function, you must select an authorized OBS bucket. If the selected OBS bucket is not authorized, click Authorize OBS.

    NOTE:

    For details about operations related to OBS, see Getting Started in the Object Storage Service Console Operation Guide.

    Alarm Generation upon Job Exception

    Whether to report job exceptions, for example, abnormal job running or exceptions due to an insufficient balance, to users via SMS or email

    Topic Name

    This parameter is only used when Alarm Generation upon Job Exception is selected.

    Select a user-defined SMN topic. For details about how to customize SMN topics, see Creating a Topic in the Simple Message Notification User Guide.

    Auto Restart upon Exception

    Whether to enable automatic restart. If this function is enabled, CS automatically restarts any job that has become abnormal.

  8. From the left navigation tree, click Select the Target Cluster.

    Figure 6 Selecting the cluster
    • User-defined jobs can only run on existing exclusive clusters. If there are no exclusive clusters, create one by referring to Creating a Cluster Billed on a Per-per-Use Basis.
    • If there are no exclusive clusters in the Cluster drop-down list, create one. Then, switch to the User Quota Management page under Cluster Management as the tenant account, bind the created cluster to the current user, and allocate the SPU quota. For details, see Modifying a Sub-user.

  9. Click Submit in the upper right corner. On the displayed Job Configurations page, click OK to submit and start the job.

    After the job is submitted, the system automatically switches to the Job Management page, and the created job is displayed in the job list. You can view the job status in the Status column. After a job is successfully submitted, the job status will change from Submitting to Running.

    If the job status is Submission failed or Running exception, the job submission failed or the job did not execute successfully. In this case, you can move the cursor over the status icon in the Status column of the job list to view the error details. You can click to copy these details. After handling the fault based on the provided information, resubmit the job.