Help Center> DataArts Studio> User Guide> DataArts Factory> Job Development> Developing a Real-Time Processing Single-Task DLI Spark Job
Updated on 2024-04-03 GMT+08:00

Developing a Real-Time Processing Single-Task DLI Spark Job

Prerequisites

A single-task real-time processing DLI Spark job has been created. For details, see Creating a Job.

Configuring a DLI Spark job

Table 1 Properties

Parameter

Mandatory

Description

Job Name

Yes

Enter the DLI Spark job name.

The job name can contain 1 to 64 characters. Only letters, digits, hyphens (-), and underscores (_) are allowed.

DLI Queue

Yes

Select a DLI queue.

Spark Version

No

  • 2.3.2
  • 2.4.5
  • 3.1.1

Job Type

No

Type of the Spark image used by the job. The following options are available:

  • Basic
  • AI-enhanced
  • Image

    If you select this option, select an image, and its version is automatically displayed. You can create images by following the instructions in Image Management.

Job Running Resource

No

  • 8 vCPUs, 32 GB memory
  • 16 vCPUs, 64 GB memory
  • 32 vCPUs, 128 GB memory

Major Job Class

No

Java/Scala main class of the job

Spark program resource package

Yes

Resource package on which the Spark program depends

Resource Type

Yes

  • OBS path
  • DLI program package

DLI program package: The resource package file will not be uploaded to the DLI resource management system before the job is executed.

OBS path: The resource package file will not be uploaded to DLI resource management system before the job is executed. The OBS path where the file is located is part of the message body for starting the job. This type is recommended.

Group

No

This parameter is required when Resource Type is set to DLI program package.

A Spark program resource package is uploaded to a specified group. The main JAR package and dependency package are uploaded to the same group.

  • Use Existing: Select an existing group.
  • Create New: Create a group. The group name can contain only letters, digits, periods (.), hyphens (-), and underscores (_).
  • Do not use

Major-Class Entry Parameters

No

Press Enter to separate parameters.

Spark program resource package

No

Enter parameters in key=value format and separate parameters by pressing Enter.

Module Name

No

Select one or more module names.

Metadata Access

No

Whether metadata can be accessed

To access the OBS table created by the DLI SQL job in the DLI Spark job, enable metadata access.

Table 2 Advanced settings

Parameter

Mandatory

Description

Job Status Polling Interval (s)

Yes

Set the interval at which the system checks whether the job is complete. The interval can range from 30s to 60s, or 120s, 180s, 240s, or 300s.

During job execution, the system checks the job status at the configured interval.

Maximum Wait Time

Yes

Set the timeout interval for the job. If the job is not complete within the timeout interval and retry is enabled, the job will be executed again.

NOTE:

If the job is in starting state and fails to start, it will fail upon timeout.

Retry upon Failure

No

Whether to re-execute a node if it fails to be executed.

  • Yes: The node task will be re-executed, and the following parameters must be configured:
    • Retry upon Timeout
    • Maximum Retries
    • Retry Interval (seconds)
  • No: The node will not be re-executed. This is the default setting.
    NOTE:

    If retry is configured for a job node and the timeout duration is configured, the system allows you to retry a node when the node execution times out.

    If a node is not re-executed when it fails upon timeout, you can go to the Default Configuration page to modify this policy.

    Retry upon Timeout is displayed only when Retry upon Failure is set to Yes.