Creating a User-Defined Spark Job
This section describes how to create a user-defined Spark job. You can perform secondary development based on Spark APIs, build your own JAR file, and submit the JAR file to CS clusters. CS is fully compatible with open-source community APIs. To create a user-defined Spark job, you need to compile and build application JAR files. You must have a certain understanding of Spark secondary development and have high requirements related to stream computing complexity.
Prerequisites
- You have complied the secondary development application code into a JAR file and stored the JAR file on your local PC or uploaded it to an OBS bucket.
- The Spark dependency packages have been integrated into the CS server and system hardening has been performed based on the open-source community version. You need to exclude related Spark dependencies when building an application JAR file. To achieve this, use Maven or SBT to set scope to provided.
Procedure
- You can create a user-defined Spark job on either of the following two pages: Overview and Job Management.
- Overview
- In the navigation tree on the left pane of the CS management console, click Overview to switch to the Overview page. Figure 1 Creating a job on the Overview page
- Click Create Job to switch to the Create Job dialog box.
- In the navigation tree on the left pane of the CS management console, click Overview to switch to the Overview page.
- Job Management
- In the navigation tree on the left pane of the CS management console, choose to switch to the Job Management page. Figure 2 Creating a job on the Job Management page
- On the Job Management page, click Create Job to switch to the Create Job dialog box.
- In the navigation tree on the left pane of the CS management console, choose to switch to the Job Management page.
- Overview
- Specify job parameters. Figure 3 Creating a user-defined Spark job
Table 1 Parameters related to job creation Parameter
Description
Type
Select Spark Streaming JAR Job.
Name
Name of a job. Enter 1 to 57 characters. Only letters, digits, hyphens (-), and underscores (_) are allowed.
NOTE:The job name must be globally unique.
Description
Description of a job. It can be up to 512 bytes long.
- For Enterprise Project, select an enterprise project that you created on the Enterprise Management console.
For details about how to create an enterprise project on the Enterprise Management console, see Creating an Enterprise Project in the Enterprise Management User Guide.
The system also has a built-in enterprise project, default. If you do not select an enterprise project for the job, the default project is used instead.
During job creation, if the job is successfully bound to an enterprise project, the job has been created. If the binding fails, the system sends an alarm and the job fails to be created.
When you delete a job, the association between the job and its enterprise project is automatically deleted as well.
- (Optional) Add tags for the job. In this step, configure the parameters in the following table as required. The tags are optional. If you do not need tags, skip this step.
Table 2 Tag parameters Parameter
Description
Tag key
You can perform the following operations:
- Click the text box and select a predefined tag key from the drop-down list. NOTE:
To add a predefined tag, you need to create one on TMS and select it from the Tag key drop-down list. You can click View Predefined Tag to enter the Predefined Tag page of TMS. Then, click Create Tag to create a predefined tag. For details, see section Creating Predefined Tags in the Tag Management Service User Guide.
- Enter a tag key in the text box. NOTE:
A tag key contains a maximum of 36 characters. The first and last characters cannot be spaces. The following characters are not allowed: =*,<>\|/
Tag value
You can perform the following operations:
- Click the text box and select a predefined tag value from the drop-down list.
- Enter a tag value in the text box. NOTE:
A tag value contains a maximum of 43 characters. The first and last characters cannot be spaces. The following characters are not allowed: =*,<>\|/
- A maximum of 10 tags can be added.
- Only one tag value can be added to a tag key.
- The key name must be unique in the same resource.
- Click the text box and select a predefined tag key from the drop-down list.
- Click OK to enter the page.
- Upload the JAR file. Figure 4 Uploading the JAR file
Table 3 Parameter description Name
Description
Upload Mode
You can use either of the following methods to upload the JAR file:
- Local upload: Upload the JAR file saved in your local PC to the CS server. NOTE:
To upload a JAR file larger than 8 MB, upload the JAR file to OBS and then reference it from OBS.
- OBS: Select a file from OBS as the data source and upload the file to the OBS bucket so that CS can obtain data from OBS. NOTE:
With this method, you need to create a bucket on the OBS management console and upload the customized JAR file to the bucket before the uploading.
Uploaded JAR File
Name of the uploaded JAR file.
Main Class
Name of the main class in the JAR file to be uploaded, for example, KafkaMessageStreaming. If you select Default for Main Class, the entry point is specified in the Manifest file in the JAR file. If you select Manually assign for Main Class, you need to specify Class Name. In the text box next to Class Arguments, enter the class arguments that are space-separated.
NOTE:If you specify a main class in a file, the value of this parameter must contain the file path. For example, packagePath.KafkaMessageStreaming.
Arguments
List of parameters related to the main class. Every two parameters are separated by a space.
Configuration File
- You can select the spark-defaults.conf file or user-defined configuration files. The user-defined configuration files are transferred to the driver or executor through --files.
- If the core-site.xml or hdfs-site.xml file exists, rename the files to prevent conflicts with corresponding files in the CS cluster.
- To upload multiple configuration files, compress them into a ZIP package and then upload the package.
There are two methods to upload the configuration files:
- Local upload: Upload the file saved in your local PC to the CS server.
- OBS: Select a file from OBS as the data source and upload the file to the OBS bucket. CS then obtains data from OBS.
- Local upload: Upload the JAR file saved in your local PC to the CS server.
- Click Configure Parameters on the left to configure job parameters. Figure 5 Performing basic configurations of the user-defined Spark job
Table 4 Parameter description Name
Description
SPUs
An SPU consists of 1 vCPU compute and 4 GB memory.
This is the total number of SPUs configured for a user-defined Spark job, including the SPUs configured for the driver node and all executor nodes.
Driver SPUs
Number of SPUs used for each driver node. By default, one SPU is configured. You can select one to four SPUs.
Executors
Number of Executor nodes. The value ranges from 1 to 100. The default value is 1.
SPUs per Executor
Number of SPUs used for each Executor node. By default, one SPU is configured. You can select one to four SPUs for Job Manager.
Save Job Log
Whether to save job logs
To enable this function, you must select an authorized OBS bucket. If the selected OBS bucket is not authorized, click Authorize OBS.
NOTE:For details about operations related to OBS, see in the Object Storage Service Console Operation Guide.
Alarm Generation upon Job Exception
Whether to report job exceptions, for example, abnormal job running or exceptions due to an insufficient balance, to users via SMS or email
Topic Name
This parameter is only used when Alarm Generation upon Job Exception is selected.
Select a user-defined SMN topic. For details about how to customize SMN topics, see Creating a Topic in the Simple Message Notification User Guide.
Auto Restart upon Exception
Whether to enable automatic restart. If this function is enabled, CS automatically restarts any job that has become abnormal.
- From the left navigation tree, click Select the Target Cluster. Figure 6 Selecting the cluster
- User-defined jobs can only run on existing exclusive clusters. If there are no exclusive clusters, create one by referring to Creating a Cluster Billed on a Per-per-Use Basis.
- If there are no exclusive clusters in the Cluster drop-down list, create one. Then, switch to the User Quota Management page under Cluster Management as the tenant account, bind the created cluster to the current user, and allocate the SPU quota. For details, see Modifying a Sub-user.
- Click Submit in the upper right corner. On the displayed Job Configurations page, click OK to submit and start the job.
After the job is submitted, the system automatically switches to the page, and the created job is displayed in the job list. You can view the job status in the column. After a job is successfully submitted, the job status will change from to .
If the job status is or , the job submission failed or the job did not execute successfully. In this case, you can move the cursor over the status icon in the column of the job list to view the error details. You can click
to copy these details. After handling the fault based on the provided information, resubmit the job.
Last Article: Creating a User-Defined Flink Job
Next Article: Debugging a Job
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.