Updated on 2024-04-29 GMT+08:00

Creating a Job

Function

This API is used to create a job. A job consists of one or more nodes, such as Hive SQL and CDM Job nodes. DLF supports two types of jobs: batch jobs and real-time jobs.

URI

  • URI format

    POST /v1/{project_id}/jobs

  • Parameter description
    Table 1 URI parameters

    Parameter

    Mandatory

    Type

    Description

    project_id

    Yes

    String

    Project ID. For details about how to obtain a project ID, see Project ID and Account ID.

Request Parameters

Table 2 Request header parameter

Parameter

Mandatory

Type

Description

workspace

No

String

Workspace ID.

  • If this parameter is not set, data in the default workspace is queried by default.
  • To query data in other workspaces, this header must be carried.
    NOTE:
    • You need to specify a workspace for multiple DataArts Studio instances.
    • This parameter is mandatory if no default workspace is available. Otherwise, an error is reported.
Table 3 Parameters

Parameter

Mandatory

Type

Description

name

Yes

String

Job name. The name contains a maximum of 128 characters, including only letters, numbers, hyphens (-), underscores (_), and periods (.). The job name must be unique.

nodes

Yes

List<Node>

Node definition. For details, see Table 4.

schedule

Yes

Schedule data structure

Scheduling configuration. For details, see Table 5.

params

No

List<Param>

Job parameter definition. For details, see Table 6.

directory

No

String

Path of a job in the directory tree. If the directory of the path does not exist during job creation, a directory is automatically created in the root directory /, for example, /dir/a/.

processType

Yes

String

Job type.

  • REAL_TIME: real-time processing
  • BATCH: batch processing

singleNodeJobFlag

No

Boolean

Whether the job is a single-task job. The default value is false.

singleNodeJobType

No

String

Single task type. If processType is BATCH, the following values are available for this parameter:

  • DLISQL
  • DWSSQL
  • HiveSQL
  • SparkSQL
  • RDSSQL

If Type is set to REAL_TIME, Type can be set to FlinkSQL, FlinkJar , or DLISpark.

lastUpdateUser

No

String

User who last updated the job

logPath

No

String

OBS path for storing job run logs

basicConfig

No

BasicConfig data structure

Basic job information. For details, see Table 29.

targetStatus

No

String

This parameter is required if the review function is enabled. It indicates the target status of the job. The value can be SAVED, SUBMITTED, or PRODUCTION.

  • SAVED indicates that the job is saved but cannot be scheduled or executed. It can be executed only after submitted and approved.
  • SUBMITTED indicates that the job is automatically submitted after it is saved and can be executed after it is approved.
  • PRODUCTION indicates that the job can be directly executed after it is created. Note: Only the workspace administrator can create jobs in the PRODUCTION state.

approvers

No

List<JobApprover>

Job approver. This parameter is required if the review function is enabled. For details, see Table 33.

Table 4 Node data structure description

Parameter

Mandatory

Type

Description

name

Yes

String

Node name. The name contains a maximum of 128 characters, including only letters, numbers, hyphens (-), underscores (_), and periods (.). Names of the nodes in a job must be unique.

type

Yes

String

Node type. The options are as follows:

  • Hive SQL: Runs Hive SQL scripts.
  • Spark SQL: Runs Spark SQL scripts.
  • DWS SQL: Runs DWS SQL scripts.
  • DLI SQL: Runs DLI SQL scripts.
  • Shell: Runs shell SQL scripts.
  • CDM Job: Runs CDM jobs.
  • DIS Transfer Task: Creates DIS dump tasks.
  • CloudTable Manager: Manages CloudTable tables, including creating and deleting tables.
  • OBS Manager: Manages OBS paths, including creating and deleting paths.
  • RestClient: Sends REST API requests.
  • SMN: Sends short messages or emails.
  • MRS Spark: Runs Spark jobs of MRS.
  • MapReduce: Runs MapReduce jobs of MRS.
  • MRSFlinkJob: Runs FlinkJob jobs of MRS.
  • MRS HetuEngine: Runs HetuEngine jobs of MRS.
  • DLI Spark: Runs Spark jobs of DLF.
  • RDSSQL: Transfers SQL statements to RDS for execution.
  • ModelArts Train: Executes workflow jobs of ModelArts.

location

Yes

Location data structure

Location of a node on the job canvas. For details, see Table 7.

preNodeName

No

List<String>

Name of the previous node on which the current node depends.

conditions

No

List<Condition>

Node execution condition. Whether the node is executed or not depends on the calculation result of the EL expression saved in the expression field of condition. For details, see Table 8.

properties

Yes

List<Property>

Node properties. For details, see Table 14.

Each type of node has its own property definition.

pollingInterval

No

Int

Interval at which node running results are checked.

Unit: second; value range: 1 to 60

Default value: 10

execTimeOutRetry

No

String

Whether to retry a node upon timeout. The default value is false.

maxExecutionTime

No

Int

Maximum execution time of a node. If a node is not executed within the maximum execution time, the node is set to the failed state.

The unit is minute. The value ranges from 5 to 7200. Other values do not take effect.

Default value: 60

retryTimes

No

Int

Number of the node retries. The value ranges from 1 to 100.

Default value: 1

retryInterval

No

Int

Interval at which a retry is performed upon a failure. The value ranges from 5 to 600.

Unit: second

Default value: 120

failPolicy

No

String

Node failure policy.

  • FAIL: Terminate the execution of the current job.
  • IGNORE: Continue to execute the next node.
  • SUSPEND: Suspend the execution of the current job.
  • FAIL_CHILD: Terminate the execution of the subsequent node.

    The default value is FAIL.

eventTrigger

No

Event data structure

Event trigger for the real-time job node. For details, see Table 11.

cronTrigger

No

Cron data structure

Cron trigger for the real-time job node. For details, see Table 9.

Table 5 Schedule data structure description

Parameter

Mandatory

Type

Description

type

Yes

String

Scheduling type.

  • EXECUTE_ONCE: The job runs immediately and runs only once.
  • CRON: The job runs periodically.
  • EVENT: The job is triggered by events.

cron

No

Data structure

When type is set to CRON, configure the scheduling frequency and start time. For details, see Table 10.

event

No

Data structure

When type is set to EVENT, configure information such as the event source. For details, see Table 11.

Table 6 Param data structure description

Parameter

Mandatory

Type

Description

name

Yes

String

Name of a parameter. The name contains a maximum of 64 characters, including only letters, numbers, hyphens (-), and underscores (_).

value

Yes

String

Value of the parameter. It cannot exceed 1,024 characters.

type

No

String

Parameter type

  • variable
  • constants

    Default value: variable

Table 7 Location data structure description

Parameter

Mandatory

Type

Description

x

Yes

Int

Position of the node on the horizontal axis of the job canvas.

y

Yes

Int

Position of the node on the vertical axis of the job canvas.

Table 8 condition data structure description

Parameter

Mandatory

Type

Description

preNodeName

Yes

String

Name of the previous node on which the current node depends.

expression

Yes

String

EL expression. If the calculation result of the EL expression is true, this node is executed.

Table 9 CronTrigger data structure description

Parameter

Mandatory

Type

Description

startTime

Yes

String

Scheduling start time in the format of yyyy-MM-dd'T'HH:mm:ssZ, which is an ISO 8601 time format. For example, 2018-10-22T23:59:59+08, which indicates that a job starts to be scheduled at 23:59:59 on October 22nd, 2018.

endTime

No

String

Scheduling end time in the format of yyyy-MM-dd'T'HH:mm:ssZ, which is an ISO 8601 time format. For example, 2018-10-22T23:59:59+08, which indicates that a job stops to be scheduled at 23:59:59 on October 22nd, 2018. If the end time is not set, the job will continuously be executed based on the scheduling period.

expression

Yes

String

Cron expression in the format of <second><minute><hour><day><month><week>. For details about the value input in each field, see Table 12.

expressionTimeZone

No

String

Time zone corresponding to the Cron expression, for example, GMT+8.

Default value: time zone where DataArts Studio is located

period

Yes

String

Job execution interval consisting of a time and time unit

Example: 1 hours, 1 days, 1 weeks, 1 months

The value must match the value of expression.

dependPrePeriod

No

Boolean

Indicates whether to depend on the execution result of the current job's dependent job in the previous scheduling period.

Default value: false

dependJobs

No

DependJobs data structure

Job dependency configuration. For details, see Table 13.

concurrent

No

Integer

Number of concurrent executions allowed

Table 10 Cron data structure description

Parameter

Mandatory

Type

Description

startTime

Yes

String

Scheduling start time in the format of yyyy-MM-dd'T'HH:mm:ssZ, which is an ISO 8601 time format. For example, 2018-10-22T23:59:59+08, which indicates that a job starts to be scheduled at 23:59:59 on October 22nd, 2018.

endTime

No

String

Scheduling end time in the format of yyyy-MM-dd'T'HH:mm:ssZ, which is an ISO 8601 time format. For example, 2018-10-22T23:59:59+08, which indicates that a job stops to be scheduled at 23:59:59 on October 22nd, 2018. If the end time is not set, the job will continuously be executed based on the scheduling period.

expression

Yes

String

Cron expression in the format of <second><minute><hour><day><month><week>. For details about the value input in each field, see Table 12.

expressionTimeZone

No

String

Time zone corresponding to the Cron expression, for example, GMT+8.

Default value: time zone where DataArts Studio is located

dependPrePeriod

No

Boolean

Indicates whether to depend on the execution result of the current job's dependent job in the previous scheduling period.

Default value: false

dependJobs

No

DependJobs data structure

Job dependency configuration. For details, see Table 13.

Table 11 Event data structure description

Parameter

Mandatory

Type

Description

eventType

Yes

String

Select the corresponding connection name and topic. When a new Kafka message is received, the job is triggered.

Set this parameter to KAFKA.

Event type. Currently, only newly reported data events from the DIS stream can be monitored. Each time a data record is reported, the job runs once.

This parameter is set to DIS.

Select the OBS path to be listened to. If new files exist in the path, scheduling is triggered. The path name can be referenced using variable Job.trigger.obsNewFiles. The prerequisite is that DIS notifications have been configured for the OBS path.

Set this parameter to OBS.

channel

Yes

String

DIS stream name.

Perform the following operations to obtain the stream name:

  1. Log in to the management console.
  2. Click Data Ingestion Service and select Stream Management from the left navigation pane.
  3. The stream management page lists the existing streams.

failPolicy

No

String

Job failure policy.

  • SUSPEND: Suspend the event.
  • IGNORE: Ignore the failure and process with the next event.

Default value: SUSPEND

concurrent

No

int

Number of the concurrently scheduled jobs.

Value range: 1 to 128

Default value: 1

readPolicy

No

String

Access policy.

  • LAST: Access data from the last location.
  • NEW: Access data from a new location.

Default value: LAST

Table 12 Values in the Cron expression fields

Field

Value Range

Allowed Special Character

Description

Second

0-59

, - * /

In the current version, only 0 is allowed.

Minute

0-59

, - * /

-

Hour

0-23

, - * /

-

Day

1-31

, - * ? / L W C

-

Month

1-12

, - * /

In the current version, only * is allowed.

Week

1-7

, - * ? / L C #

Starting from Sunday.

Table 13 DependJobs data structure description

Parameter

Mandatory

Type

Description

jobs

Yes

List<String>

A list of dependent jobs. Only the existing jobs can be depended on.

dependPeriod

No

String

Dependency period.

  • SAME_PERIOD: To run a job or not depends on the execution result of its depended job in the current scheduling period.
  • PRE_PERIOD: To run a job or not depends on the execution result of its depended job in the previous scheduling period.

Default value: SAME_PERIOD

dependFailPolicy

No

String

Dependency job failure policy.

  • FAIL: Stop the job and set the job to the failed state.
  • IGNORE: Continue to run the job.
  • SUSPEND: Suspend the job.

Default value: FAIL

Table 14 Property parameters

Parameter

Mandatory

Type

Description

name

Yes

String

Property name

value

Yes

String

Property value

Table 15 Parameters of the Hive SQL node

Parameter

Mandatory

Type

Description

scriptName

Yes

String

Script name.

database

No

String

Database name.

Database in the MRS Hive. The default value is default.

connectionName

No

String

Name of a connection.

scriptArgs

No

String

Script parameter in format of key and value. Multiple parameters are separated by newlines (\n), for example, key1=value1\nkey2=value2.

Table 16 Parameters of the Spark SQL node

Parameter

Mandatory

Type

Description

scriptName

Yes

String

Script name.

database

No

String

Database name.

Database in the MRS Spark SQL. The default value is default.

connectionName

No

String

Name of a connection.

scriptArgs

No

String

Script parameter in format of key and value. Multiple parameters are separated by newlines (\n), for example, key1=value1\nkey2=value2.

Table 17 Parameters of the DWS SQL node

Parameter

Mandatory

Type

Description

scriptName

Yes

String

Script name.

database

No

String

Database name.

Database in DWS. The default value is postgres.

connectionName

No

String

Name of a connection.

scriptArgs

No

String

Script parameter in format of key and value. Multiple parameters are separated by newlines (\n), for example, key1=value1\nkey2=value2.

Table 18 Parameters of the DLI SQL node

Parameter

Mandatory

Type

Description

scriptName

Yes

String

Script name.

database

No

String

Database name.

Database in DLI.

connectionName

No

String

Name of a connection.

scriptArgs

No

String

Script parameter in format of key and value. Multiple parameters are separated by newlines (\n), for example, key1=value1\nkey2=value2.

Table 19 Parameters of the shell node

Parameter

Mandatory

Type

Description

scriptName

Yes

String

Script name.

connectionName

Yes

String

Name of a connection.

arguments

No

String

Shell script parameter.

Table 20 Parameters of the CDM Job node

Parameter

Mandatory

Type

Description

clusterName

Yes

String

Cluster name.

You can obtain the cluster name from the CDM cluster list on the DataArts Migration page of the DataArts Studio console.

jobName

Yes

String

Job name.

To obtain the job name, access the DataArts Studio console, choose DataArts Migration, click a cluster name on the Cluster Management page, and click Job Management on the displayed page.

Table 21 Parameters of the DISTransferTask node

Parameter

Mandatory

Type

Description

streamName

Yes

String

DIS stream name.

Perform the following operations to obtain the stream name:

  1. Log in to the management console.
  2. Click Data Ingestion Service and select Stream Management from the left navigation pane.
  3. The stream management page lists the existing streams.

destinationType

Yes

String

Dump target.

  • CloudTable
  • OBS

duplicatePolicy

Yes

String

Duplicate name policy.

  • OVERWRITE
  • IGNORE

configuration

Yes

Data structure

Dump configuration. For details, see the description of the obs_destination_descriptor and cloudtable_destination_descriptor parameters in .

Table 22 Parameters of the CloudTableManager node

Parameter

Mandatory

Type

Description

namespace

No

String

Namespace.

Default value: default

action

Yes

String

Action type.

  • CREATE_TABLE: Create a table.
  • DELETE_TABLE: Delete a table.

table

No

String

Table name.

columnFamily

No

String

Column family.

Table 23 Parameters of the OBSManager node

Parameter

Mandatory

Type

Description

action

Yes

String

Action type.

  • CREATE_PATH: Create an OBS path.
  • DELETE_PATH: Delete an OBS path.

path

Yes

String

OBS path.

Table 24 Parameters of the RestClient node

Parameter

Mandatory

Type

Description

url

Yes

String

URL address.

URL of the cloud service.

method

Yes

String

HTTP method.

  • GET
  • POST
  • PUT
  • DELETE

headers

No

String

HTTP message header in the format of <message header name>=<value>. Multiple message headers are separated by newlines.

body

No

String

Message body.

Table 25 Parameters of the SMN node

Parameter

Mandatory

Type

Description

topic

Yes

String

SMN topic URN.

Perform the following operations to obtain an SMN topic URN:

  1. Log in to the management console.
  2. Click Simple Message Notification and choose Topic Management > Topics from the list on the left.

You can obtain the SMN topic URN in the topic list.

subject

Yes

String

Message title, which is used as the subject of an email sent to a subscriber.

messageType

Yes

String

Message type.

  • NORMAL
  • STRUCTURE
  • TEMPLATE

message

Yes

String

Message to be sent.

Table 26 Parameters of the MRS Spark node

Parameter

Mandatory

Type

Description

clusterName

Yes

String

MRS cluster name.

Perform the following operations to obtain the MRS cluster name:

  1. Log in to the management console.
  2. Click MapReduce Service and choose Clusters > Active Clusters from the left navigation pane.

You can obtain the cluster name from the active clusters.

jobName

Yes

String

MRS job name.

The job name is user-defined.

resourcePath

Yes

String

OBS resource path of the custom Spark JAR package

parameters

Yes

String

Custom parameters of the Spark JAR package

You can specify parameters for a custom JAR package.

input

No

String

Input path.

Input data path of the MRS Spark job. The path can be an HDFS or OBS path.

output

No

String

Output path.

Output data path of the MRS Spark job. The path can be an HDFS or OBS path.

programParameter

No

String

Program parameter

Multiple key-value pairs are allowed and separated by vertical bars (|).

Table 27 Parameters of the MapReduce node

Parameter

Mandatory

Type

Description

clusterName

Yes

String

MRS cluster name.

Perform the following operations to obtain the MRS cluster name:

  1. Log in to the management console.
  2. Click MapReduce Service and choose Clusters > Active Clusters from the left navigation pane.

You can obtain the cluster name from the active clusters.

jobName

Yes

String

MRS job name.

The job name is user-defined.

resourcePath

Yes

String

Resource path.

parameters

Yes

String

Job parameter.

input

Yes

String

Input path.

Input data path of the MapReduce job. The path can be an HDFS or OBS path.

output

Yes

String

Output path.

Output data path of the MapReduce job. The path can be an HDFS or OBS path.

Table 28 Parameters of the DLI Spark node

Parameter

Mandatory

Type

Description

clusterName

Yes

String

DLI queue name

Perform the following operations to obtain the DLI queue name:

  1. Log in to the management console.
  2. Click Data Lake Insight and then Queue Management.

You can obtain the queue name from the queue management list.

jobName

Yes

String

DLI job name.

Perform the following operations to obtain the job name:

  1. Log in to the management console.
  2. Click Data Lake Insight and then Spark Jobs.
  3. Choose Job Management.

You can obtain the job name from the job management list.

resourceType

No

String

Type of the running resource of the DLI job . This parameter is optional.

1. OBS path: OBS

2. DLI package: DLIResources

jobClass

No

String

Main class name. When the application type is .jar, the main class name cannot be empty.

resourcePath

Yes

String

JAR package resource path.

jarArgs

No

String

Main-class entry parameter.

sparkConfig

No

String

Running parameter of the Spark job.

Table 29 BasicConfig job information

Parameter

Mandatory

Type

Description

owner

No

String

Job owner. The length cannot exceed 128 characters.

agency

No

String

Job agency

isIgnoreWaiting

No

int

Whether to ignore the waiting time in the instance timeout duration. The value can be 0 or 1 (default).

0: The waiting time is not ignored.

1: The waiting time is ignored.

priority

No

int

Job priority. The value ranges from 0 to 2. The default value is 0. 0 indicates a top priority, 1 indicates a medium priority, and 2 indicates a low priority.

executeUser

No

String

Job execution user. The value must be an existing username.

instanceTimeout

No

int

Instance timeout interval. The unit is minute. The value ranges from 5 to 1440. The default value is 60.

customFields

No

Map<String,String>

User-defined field. The length cannot exceed 2048 characters.

Table 30 Parameters of the MRS Flink node

Parameter

Mandatory

Type

Description

clusterName

Yes

String

MRS cluster name.

Perform the following operations to obtain the MRS cluster name:

  1. Log in to the management console.
  2. Click MapReduce Service and choose Clusters > Active Clusters from the left navigation pane.

You can obtain the cluster name from the active clusters.

jobName

Yes

String

MRS job name.

The job name is user-defined.

flinkJobType

Yes

String

Flink job type, which can be FLink SQL or Flink JAR

flinkJobProcessType

Yes

String

Flink job processing mode, which can be batch or stream

scriptName

No

String

SQL script associated with the Flink SQL job

resourcePath

No

String

OBS resource path of the custom Flink JAR package

input

No

String

Input path.

Input data path of the MRS Flink job. The path can be an HDFS or OBS path.

output

No

String

Output path.

Output data path of the MRS Flink job. The path can be an HDFS or OBS path.

programParameter

No

String

Program parameter

Multiple key-value pairs are allowed and separated by vertical bars (|).

Table 31 Parameters of the MRS HetuEngine node

Parameter

Mandatory

Type

Description

clusterName

Yes

String

MRS cluster name.

Perform the following operations to obtain the MRS cluster name:

  1. Log in to the management console.
  2. Click MapReduce Service and choose Clusters > Active Clusters from the left navigation pane.

You can obtain the cluster name from the active clusters.

jobName

Yes

String

MRS job name.

The job name is user-defined.

statementOrScript

Yes

String

Whether to use an SQL statement for the node or associate an SQL script with the node

scriptName

No

String

SQL script to be associated with the node

statement

No

String

Custom content of the SQL statement

Data Warehouse

Yes

String

Data connection required by HetuEngine

Schema

Yes

String

Name of the schema to be accessed through HetuEngine

Database

Yes

String

Name of the database to be accessed through HetuEngine

Queue

No

String

Name of the resource queue required by HetuEngine

Table 32 Parameters of the ModelArts Train node

Parameter

Mandatory

Type

Description

clusterName

Yes

String

MRS cluster name

Perform the following operations to obtain the MRS cluster name:

  1. Log in to the management console.
  2. Click MapReduce Service and choose Clusters > Active Clusters from the left navigation pane.

You can obtain the cluster name from the active clusters.

jobName

Yes

String

MRS job name.

You can set a custom value.

statementOrScript

Yes

String

Whether to use an SQL statement for the node or associate an SQL script with the node

scriptName

No

String

SQL script to be associated with the node

Table 33 Approver attributes

Parameter

Mandatory

Type

Description

approverName

Yes

String

Approver name

Response Parameters

None.

Example Request

Create a job named myJob whose type is BATCH, scheduling configuration is CRON, path in the directory tree is /myDir, and OBS path for storing job run logs is obs://dlf-test-log.

POST /v1/b384b9e9ab9b4ee8994c8633aabc9505/jobs
{
    "basicConfig": {
        "customFields": {},
        "executeUser": "",
        "instanceTimeout": 0,
        "owner": "test_user",
        "priority": 0
    },
    "directory": "/myDir",
    "logPath": "obs://dlf-test-log",
    "name": "myJob",
    "nodes": [
        {
            "failPolicy": "FAIL_CHILD",
            "location": {
                "x": "-45.5",
                "y": "-134.5"
            },
            "maxExecutionTime": 360,
            "name": "MRS_Hive_SQL",
            "pollingInterval": 20,
            "preNodeName": [],
            "properties": [
                {
                    "name": "scriptName",
                    "value": "test_hive_sql"
                },
                {
                    "name": "connectionName",
                    "value": "mrs_hive_test"
                },
                {
                    "name": "database",
                    "value": "default"
                },
                {
                    "name": "scriptArgs",
                    "value": "test_var=111"
                }
            ],
            "retryInterval": 120,
            "retryTimes": 0,
            "type": "HiveSQL"
        }
    ],
    "processType": "BATCH",
    "schedule": {
        "type": "CRON"
    }
}

Create a job when the review function is enabled.

POST /v1/b384b9e9ab9b4ee8994c8633aabc9505/jobs
{
    "basicConfig": {
        "customFields": {},
        "executeUser": "",
        "instanceTimeout": 0,
        "owner": "test_user",
        "priority": 0
    },
    "directory": "/myDir",
    "logPath": "obs://dlf-test-log",
    "name": "myJob",
    "nodes": [
        {
            "failPolicy": "FAIL_CHILD",
            "location": {
                "x": "-45.5",
                "y": "-134.5"
            },
            "maxExecutionTime": 360,
            "name": "MRS_Hive_SQL",
            "pollingInterval": 20,
            "preNodeName": [],
            "properties": [
                {
                    "name": "scriptName",
                    "value": "test_hive_sql"
                },
                {
                    "name": "connectionName",
                    "value": "mrs_hive_test"
                },
                {
                    "name": "database",
                    "value": "default"
                },
                {
                    "name": "scriptArgs",
                    "value": "test_var=111"
                }
            ],
            "retryInterval": 120,
            "retryTimes": 0,
            "type": "HiveSQL"
        }
    ],
    "processType": "BATCH",
    "schedule": {
        "type": "CRON"
    },
    "targetStatus":"SUBMITTED",
    "approvers": [
        {
            "approverName": "userName1"
        },
        {
            "approverName": "userName2"
        }
    ]
}

Example Response

  • Success response

    HTTP status code 204

  • Failure response

    HTTP status code 400

    {
        "error_code":"DLF.0102",
        "error_msg":"The job name already exists."
    }