Help Center> DataArts Studio> User Guide> DataArts Factory> Job Development> Developing a Real-Time Processing Single-Task Flink Jar Job
Updated on 2024-04-29 GMT+08:00

Developing a Real-Time Processing Single-Task Flink Jar Job

Prerequisites

A single-task real-time processing Flink Jar job has been created. For details, see Creating a Job.

Configuring the Flink Jar Job

Table 1 Properties

Parameter

Mandatory

Description

Flink Job Name

Yes

Enter the Flink job name.

The name is automatically generated in Workspace-Job name format.

The job name can contain 1 to 64 characters. Only letters, digits, hyphens (-), and underscores (_) are allowed.

MRS Cluster

Yes

Select an MRS cluster.

NOTE:

Currently, jobs with a single Flink Jar node support MRS 3.2.0-LTS.1 and later versions.

Program Parameter

No

Set job running parameters.

(Optional) Configure optimization parameters such as threads, memory, and vCPUs for the job to optimize resource usage and improve job execution performance.

CAUTION:

You can query historical checkpoints and select a specified checkpoint to start a Flink JAR job. To make a Flink checkpoint take effect, configure the following two parameters:

  • Checkpoint interval:

    -yD: execution.checkpointing.interval=1000

  • Number of reserved checkpoints:

    -yD: state.checkpoints.num-retained=10

    When querying the checkpoint list, enter parameter -s and click the parameter value text box. The parameter value will be automatically displayed.

NOTE:

This parameter is mandatory if the cluster version is MRS 1.8.7 or later than MRS 2.0.1.

Click Select Template and select a parameter template. You can also select multiple templates. For details on how to create data connections, see Configuring a Template.

For details about the program parameters of MRS Spark jobs, see Running a Flink Job in the MapReduce Service User Guide.

Job Execution Parameter

No

Set the parameters for the Flink job.

Variables required for executing the Flink job. These variables are specified by the functions in the Hive script. Multiple parameters are separated by spaces.

MRS Resource Queue

No

Select a created MRS resource queue.

Select a queue you configured in the queue permissions of DataArts Security. If you set multiple resource queues for this node, the resource queue you select here has the highest priority.

Flink job resource package

Yes

Select a JAR package. Before selecting a JAR package, upload the JAR package to the OBS bucket, create a resource on the Manage Resource page, and add the JAR package to the resource management list. For details, see Creating a Resource.

Rerun Policy

No

  • Rerun from the previous checkpoint
  • Rerun the job

Input Data Path

No

Set the input data path. You can select an HDFS or OBS path.

Output Data Path

No

Set the output data path. You can select an HDFS or OBS path.

Table 2 Advanced settings

Parameter

Mandatory

Description

Job Status Polling Interval (s)

Yes

Set the interval at which the system checks whether the job is complete. The interval can range from 30s to 60s, or 120s, 180s, 240s, or 300s.

During job execution, the system checks the job status at the configured interval.

Maximum Wait Time

Yes

Set the timeout interval for the job. If the job is not complete within the timeout interval and retry is enabled, the job will be executed again.

NOTE:

If the job is in starting state and fails to start, it will fail upon timeout.

Retry upon Failure

No

Whether to re-execute a node if it fails to be executed.

  • Yes: The node task will be re-executed, and the following parameters must be configured:
    • Retry upon Timeout
    • Maximum Retries
    • Retry Interval (seconds)
  • No: The node will not be re-executed. This is the default setting.
NOTE:

If retry is configured for a job node and the timeout duration is configured, the system allows you to retry a node when the node execution times out.

If a node is not re-executed when it fails upon timeout, you can go to the Default Configuration page to modify this policy.

Retry upon Timeout is displayed only when Retry upon Failure is set to Yes.