Updated on 2022-09-15 GMT+08:00

Creating a Job

Function

This API is used to create a job. A job consists of one or more nodes, such as Hive SQL and CDM Job nodes. DLF supports two types of jobs: batch jobs and real-time jobs.

URI

  • URI format

    POST /v1/{project_id}/jobs

  • Parameter description
    Table 1 URI parameter

    Parameter

    Mandatory

    Type

    Description

    project_id

    Yes

    String

    Project ID. For details about how to obtain a project ID, see Project ID and Account ID.

Request

Table 2 Request header parameter

Parameter

Mandatory

Type

Description

workspace

No

String

Workspace ID.

  • If this parameter is not set, data in the default workspace is queried by default.
  • To query data in other workspaces, this header must be carried.
Table 3 Parameters

Parameter

Mandatory

Type

Description

name

Yes

String

Job name. The name contains a maximum of 128 characters, including only letters, numbers, hyphens (-), underscores (_), and periods (.). The job name must be unique.

nodes

Yes

List<Node>

Node definition. For details, see Table 4.

schedule

Yes

Schedule data structure

Scheduling configuration. For details, see Table 5.

params

No

List<Param>

Job parameter definition. For details, see Table 6.

directory

No

String

Directory for saving the job. The value must be an existing directory, for example, /dir/a/. The default value is the root directory.

processType

Yes

String

Job type.

  • REAL_TIME: real-time processing
  • BATCH: batch processing

basicConfig

No

BasicConfig data structure

Basic job information. For details, see Table 27.

Table 4 Node data structure description

Parameter

Mandatory

Type

Description

name

Yes

String

Node name. The name contains a maximum of 128 characters, including only letters, numbers, hyphens (-), underscores (_), and periods (.). Names of the nodes in a job must be unique.

type

Yes

String

Node type. The options are as follows:

  • Hive SQL: Runs Hive SQL scripts.
  • Spark SQL: Runs Spark SQL scripts.
  • DWS SQL: Runs DWS SQL scripts.
  • DLISQL: Runs DLI SQL scripts.
  • Shell: Runs shell SQL scripts.
  • CDM Job: Runs CDM jobs.
  • CloudTable Manager: Manages CloudTable tables, including creating and deleting tables.
  • OBS Manager: Manages OBS paths, including creating and deleting paths.
  • RESTAPI: Sends REST API requests.
  • SMN: Sends short messages or emails.
  • MRS Spark: Runs Spark jobs of MRS.
  • MapReduce: Runs MapReduce jobs of MRS.
  • DLI Spark: Runs Spark jobs of DLF.
  • RDS SQL: Transfers SQL statements to RDS for execution.

location

Yes

Location data structure

Location of a node on the job canvas. For details, see Table 7.

preNodeName

No

List<String>

Name of the previous node on which the current node depends.

conditions

No

List<Condition>

Node execution condition. Whether the node is executed or not depends on the calculation result of the EL expression saved in the expression field of condition. For details, see Table 8.

properties

Yes

List

Node property. Each type of node has its own property definition.

pollingInterval

No

Int

Interval at which node running results are checked.

Unit: second; value range: 1 to 60

Default value: 10

maxExecutionTime

No

Int

Maximum execution time of a node. If a node is not executed within the maximum execution time, the node is set to the failed state.

Unit: minute; value range: 5 to 1440

Default value: 60

retryTimes

No

Int

Number of the node retries. The value ranges from 0 to 5. 0 indicates no retry.

Default value: 0

retryInterval

No

Int

Interval at which a retry is performed upon a failure. The value ranges from 5 to 120.

Unit: second

Default value: 120

failPolicy

No

String

Node failure policy.

  • FAIL: Terminate the execution of the current job.
  • IGNORE: Continue to execute the next node.
  • SUSPEND: Suspend the execution of the current job.
  • FAIL_CHILD: Terminate the execution of the subsequent node.

    The default value is FAIL.

eventTrigger

No

Event data structure

Event trigger for the real-time job node. For details, see Table 11.

cronTrigger

No

Cron data structure

Cron trigger for the real-time job node. For details, see Table 9.

Table 5 Schedule data structure description

Parameter

Mandatory

Type

Description

type

Yes

String

Scheduling type.

  • EXECUTE_ONCE: The job runs immediately and runs only once.
  • CRON: The job runs periodically.
  • EVENT: The job is triggered by events.

cron

No

Data structure

When type is set to CRON, configure the scheduling frequency and start time. For details, see Table 10.

event

No

Data structure

When type is set to EVENT, configure information such as the event source. For details, see Table 11.

Table 6 Param data structure description

Parameter

Mandatory

Type

Description

name

Yes

String

Name of a parameter. The name contains a maximum of 64 characters, including only letters, numbers, hyphens (-), and underscores (_).

value

Yes

String

Value of the parameter. It cannot exceed 1024 characters.

type

No

String

Parameter type.

  • variable
  • constants

    Default value: variable

Table 7 Location data structure description

Parameter

Mandatory

Type

Description

x

Yes

Int

Position of the node on the horizontal axis of the job canvas.

y

Yes

Int

Position of the node on the vertical axis of the job canvas.

Table 8 condition data structure description

Parameter

Mandatory

Type

Description

preNodeName

Yes

String

Name of the previous node on which the current node depends.

expression

Yes

String

EL expression. If the calculation result of the EL expression is true, this node is executed.

Table 9 CronTrigger data structure description

Parameter

Mandatory

Type

Description

startTime

Yes

String

Scheduling start time in the format of yyyy-MM-dd'T'HH:mm:ssZ, which is an ISO 8601 time format. For example, 2018-10-22T23:59:59+08, which indicates that a job starts to be scheduled at 23:59:59 on October 22nd, 2018.

endTime

No

String

Scheduling end time in the format of yyyy-MM-dd'T'HH:mm:ssZ, which is an ISO 8601 time format. For example, 2018-10-22T23:59:59+08, which indicates that a job stops to be scheduled at 23:59:59 on October 22nd, 2018. If the end time is not set, the job will continuously be executed based on the scheduling period.

expression

Yes

String

Cron expression in the format of <second><minute><hour><day><month><week>. For details about the value input in each field, see Table 12.

expressionTimeZone

No

String

Time zone corresponding to the Cron expression, for example, GMT+8.

Default value: time zone where DataArts Studio is located

period

Yes

String

Job execution interval consisting of a time and time unit

Example: 1 hours, 1 days, 1 weeks, 1 months

The value must match the value of expression.

dependPrePeriod

No

Boolean

Indicates whether to depend on the execution result of the current job's dependent job in the previous scheduling period.

Default value: false

dependJobs

No

DependJobs data structure

Job dependency configuration. For details, see Table 13.

concurrent

No

Integer

Number of concurrent executions allowed

Table 10 Cron data structure description

Parameter

Mandatory

Type

Description

startTime

Yes

String

Scheduling start time in the format of yyyy-MM-dd'T'HH:mm:ssZ, which is an ISO 8601 time format. For example, 2018-10-22T23:59:59+08, which indicates that a job starts to be scheduled at 23:59:59 on October 22nd, 2018.

endTime

No

String

Scheduling end time in the format of yyyy-MM-dd'T'HH:mm:ssZ, which is an ISO 8601 time format. For example, 2018-10-22T23:59:59+08, which indicates that a job stops to be scheduled at 23:59:59 on October 22nd, 2018. If the end time is not set, the job will continuously be executed based on the scheduling period.

expression

Yes

String

Cron expression in the format of <second><minute><hour><day><month><week>. For details about the value input in each field, see Table 12.

expressionTimeZone

No

String

Time zone corresponding to the Cron expression, for example, GMT+8.

Default value: time zone where DataArts Studio is located

dependPrePeriod

No

Boolean

Indicates whether to depend on the execution result of the current job's dependent job in the previous scheduling period.

Default value: false

dependJobs

No

DependJobs data structure

Job dependency configuration. For details, see Table 13.

Table 11 Event data structure description

Parameter

Mandatory

Type

Description

eventType

Yes

String

Select the corresponding connection name and topic. When a new Kafka message is received, the job is triggered.

Set this parameter to KAFKA.

Event type. Currently, only newly reported data events from the DIS stream can be monitored. Each time a data record is reported, the job runs once.

This parameter is set to DIS.

Select the OBS path to be listened to. If new files exist in the path, scheduling is triggered. The path name can be referenced using variable Job.trigger.obsNewFiles. The prerequisite is that DIS notifications have been configured for the OBS path.

Set this parameter to OBS.

failPolicy

No

String

Job failure policy.

  • SUSPEND: Suspend the event.
  • IGNORE: Ignore the failure and process with the next event.

Default value: SUSPEND

concurrent

No

int

Number of the concurrently scheduled jobs.

Value range: 1 to 128

Default value: 1

readPolicy

No

String

Access policy.

  • LAST: Access data from the last location.
  • NEW: Access data from a new location.

Default value: LAST

Table 12 Values in the Cron expression fields

Field

Value Range

Allowed Special Character

Description

Second

0-59

, - * /

In the current version, only 0 is allowed.

Minute

0-59

, - * /

-

Hour

0-23

, - * /

-

Day

1-31

, - * ? / L W C

-

Month

1-12

, - * /

In the current version, only * is allowed.

Week

1-7

, - * ? / L C #

Starting from Sunday.

Table 13 DependJobs data structure description

Parameter

Mandatory

Type

Description

jobs

Yes

List<String>

A list of dependent jobs. Only the existing jobs can be depended on.

dependPeriod

No

String

Dependency period.

  • SAME_PERIOD: To run a job or not depends on the execution result of its depended job in the current scheduling period.
  • PRE_PERIOD: To run a job or not depends on the execution result of its depended job in the previous scheduling period.

Default value: SAME_PERIOD

dependFailPolicy

No

String

Dependency job failure policy.

  • FAIL: Stop the job and set the job to the failed state.
  • IGNORE: Continue to run the job.
  • SUSPEND: Suspend the job.

Default value: FAIL

Table 14 Parameters of the Hive SQL node

Parameter

Mandatory

Type

Description

scriptName

Yes

String

Script name.

database

No

String

Database name.

Database in the MRS Hive. The default value is default.

connectionName

No

String

Name of a connection.

scriptArgs

No

String

Script parameter in format of key and value. Multiple parameters are separated by newlines (\n), for example, key1=value1\nkey2=value2.

Table 15 Parameters of the Spark SQL node

Parameter

Mandatory

Type

Description

scriptName

Yes

String

Script name.

database

No

String

Database name.

Database in the MRS Spark SQL. The default value is default.

connectionName

No

String

Name of a connection.

scriptArgs

No

String

Script parameter in format of key and value. Multiple parameters are separated by newlines (\n), for example, key1=value1\nkey2=value2.

Table 16 Parameters of the DWS SQL node

Parameter

Mandatory

Type

Description

scriptName

Yes

String

Script name.

database

No

String

Database name.

Database in DWS. The default value is postgres.

connectionName

No

String

Name of a connection.

scriptArgs

No

String

Script parameter in format of key and value. Multiple parameters are separated by newlines (\n), for example, key1=value1\nkey2=value2.

Table 17 Parameters of the DLI SQL node

Parameter

Mandatory

Type

Description

scriptName

Yes

String

Script name.

database

No

String

Database name.

Database in DLI.

connectionName

No

String

Name of a connection.

scriptArgs

No

String

Script parameter in format of key and value. Multiple parameters are separated by newlines (\n), for example, key1=value1\nkey2=value2.

Table 18 Parameters of the shell node

Parameter

Mandatory

Type

Description

scriptName

Yes

String

Script name.

connectionName

Yes

String

Name of a connection.

arguments

No

String

Shell script parameter.

Table 19 Parameters of the CDM Job node

Parameter

Mandatory

Type

Description

clusterName

Yes

String

Cluster name.

You can obtain the cluster name from the CDM cluster list on the DataArts Migration page of the DataArts Studio console.

jobName

Yes

String

Job name.

To obtain the job name, access the DataArts Studio console, choose DataArts Migration, click a cluster name on the Cluster Management page, and click Job Management on the displayed page.

Table 20 Parameters of the CloudTableManager node

Parameter

Mandatory

Type

Description

namespace

No

String

Namespace.

Default value: default

action

Yes

String

Action type.

  • CREATE_TABLE: Create a table.
  • DELETE_TABLE: Delete a table.

table

No

String

Table name.

columnFamily

No

String

Column family.

Table 21 Parameters of the OBSManager node

Parameter

Mandatory

Type

Description

action

Yes

String

Action type.

  • CREATE_PATH: Create an OBS path.
  • DELETE_PATH: Delete an OBS path.

path

Yes

String

OBS path.

Table 22 Parameters of the RESTAPI node

Parameter

Mandatory

Type

Description

url

Yes

String

URL address.

URL of the cloud service.

method

Yes

String

HTTP method.

  • GET
  • POST
  • PUT
  • DELETE

headers

No

String

HTTP message header in the format of <message header name>=<value>. Multiple message headers are separated by newlines.

body

No

String

Message body.

Table 23 Parameters of the SMN node

Parameter

Mandatory

Type

Description

topic

Yes

String

SMN topic URN.

Perform the following operations to obtain an SMN topic URN:

  1. Log in to the management console.
  2. Click Simple Message Notification and choose Topic Management > Topics from the list on the left.

You can obtain the SMN topic URN in the topic list.

subject

Yes

String

Message title, which is used as the subject of an email sent to a subscriber.

messageType

Yes

String

Message type.

  • NORMAL
  • STRUCTURE
  • TEMPLATE

message

Yes

String

Message to be sent.

Table 24 Parameters of the MRS Spark node

Parameter

Mandatory

Type

Description

clusterName

Yes

String

MRS cluster name.

Perform the following operations to obtain the MRS cluster name:

  1. Log in to the management console.
  2. Click MapReduce Service and choose Clusters > Active Clusters from the left navigation pane.

You can obtain the cluster name from the active clusters.

jobName

Yes

String

MRS job name.

The job name is user-defined.

resourcePath

Yes

String

OBS resource path of the custom Spark JAR package

parameters

Yes

String

Custom parameters of the Spark JAR package

You can specify parameters for a custom JAR package.

input

No

String

Input path.

Input data path of the MRS Spark job. The path can be an HDFS or OBS path.

output

No

String

Output path.

Output data path of the MRS Spark job. The path can be an HDFS or OBS path.

programParameter

No

String

Program parameter

Multiple key-value pairs are allowed and separated by vertical bars (|).

Table 25 Parameters of the MapReduce node

Parameter

Mandatory

Type

Description

clusterName

Yes

String

MRS cluster name.

Perform the following operations to obtain the MRS cluster name:

  1. Log in to the management console.
  2. Click MapReduce Service and choose Clusters > Active Clusters from the left navigation pane.

You can obtain the cluster name from the active clusters.

jobName

Yes

String

MRS job name.

The job name is user-defined.

resourcePath

Yes

String

Resource path.

parameters

Yes

String

Job parameter.

input

Yes

String

Input path.

Input data path of the MapReduce job. The path can be an HDFS or OBS path.

output

Yes

String

Output path.

Output data path of the MapReduce job. The path can be an HDFS or OBS path.

Table 26 Parameters of the DLI Spark node

Parameter

Mandatory

Type

Description

clusterName

Yes

String

DLI queue name

Perform the following operations to obtain the DLI queue name:

  1. Log in to the management console.
  2. Click Data Lake Insight and then Queue Management.

You can obtain the queue name from the queue management list.

jobName

Yes

String

DLI job name.

Perform the following operations to obtain the job name:

  1. Log in to the management console.
  2. Click Data Lake Insight and then Spark Jobs.
  3. Choose Job Management.

You can obtain the job name from the job management list.

resourceType

No

String

Resource type of the DLI job. CUSTOMIZED is returned when the parameter is customized.

jobClass

No

String

Main class name. When the application type is .jar, the main class name cannot be empty.

resourcePath

Yes

String

JAR package resource path.

jarArgs

No

String

Main-class entry parameter.

sparkConfig

No

String

Running parameter of the Spark job.

Table 27 BasicConfig job information

Parameter

Mandatory

Type

Description

owner

No

String

Job owner. The length cannot exceed 128 characters.

priority

No

int

Job priority. The value ranges from 0 to 2. The default value is 0. 0 indicates a top priority, 1 indicates a medium priority, and 2 indicates a low priority.

executeUser

No

String

Job execution user. The value must be an existing username.

instanceTimeout

No

int

Instance timeout interval. The unit is minute. The value ranges from 5 to 1440. The default value is 60.

customFields

No

Map<String,String>

User-defined field. The length cannot exceed 2048 characters.

Response

None.