Updated on 2022-12-07 GMT+08:00

Submitting a SQL Job (Recommended)

Function

This API is used to submit jobs to a queue using SQL statements.

The job types support DDL, DCL, IMPORT, QUERY, and INSERT. Functions of IMPORT and EXPORT are the same as those described in Importing Data. The difference lies in the implementation method.

Additionally, you can use other APIs to query and manage jobs. For details, see the following sections:

This API is synchronous if job_type in the response message is DCL.

URI

  • URI format

    POST /v1.0/{project_id}/jobs/submit-job

  • Parameter description
    Table 1 URI parameter

    Parameter

    Mandatory

    Type

    Description

    project_id

    Yes

    String

    Project ID, which is used for resource isolation. For details about how to obtain its value, see Obtaining a Project ID.

Request

Table 2 Request parameters

Parameter

Mandatory

Type

Description

sql

Yes

String

SQL statement that you want to execute.

currentdb

No

String

Database where the SQL statement is executed. This parameter does not need to be configured during database creation.

queue_name

No

String

Name of the queue to which a job to be submitted belongs. The name can contain only digits, letters, and underscores (_), but cannot contain only digits or start with an underscore (_).

conf

No

Array of Strings

You can set the configuration parameters for the SQL job in the form of Key/Value. For details about the supported configuration items, see Table 3.

tags

No

Array of Objects

Label of a job. For details, see Table 4.

Table 3 Configuration parameters description

Parameter

Default Value

Description

spark.sql.files.maxRecordsPerFile

0

Maximum number of records to be written into a single file. If the value is zero or negative, there is no limit.

spark.sql.autoBroadcastJoinThreshold

209715200

Maximum size of the table that displays all working nodes when a connection is executed. You can set this parameter to -1 to disable the display.

NOTE:

Currently, only the configuration unit metastore table that runs the ANALYZE TABLE COMPUTE statistics noscan command and the file-based data source table that directly calculates statistics based on data files are supported.

spark.sql.shuffle.partitions

200

Default number of partitions used to filter data for join or aggregation.

spark.sql.dynamicPartitionOverwrite.enabled

false

Whether DLI overwrites the partitions where data will be written into during runtime. If you set this parameter to false, all partitions that meet the specified condition will be deleted before data overwrite starts. For example, if you set false and use INSERT OVERWRITE to write partition 2021-02 to a partitioned table that has the 2021-01 partition, this partition will be deleted.

If you set this parameter to true, DLI does not delete partitions before overwrite starts.

spark.sql.files.maxPartitionBytes

134217728

Maximum number of bytes to be packed into a single partition when a file is read.

spark.sql.badRecordsPath

-

Path of bad records.

dli.sql.sqlasync.enabled

false

Indicates whether DDL and DCL statements are executed asynchronously. The value true indicates that asynchronous execution is enabled.

dli.sql.job.timeout

-

Sets the job running timeout interval. If the timeout interval expires, the job is canceled. Unit: ms.

Table 4 tags parameters

Parameter

Mandatory

Type

Description

key

Yes

String

Tag key.

value

Yes

String

Tag value

Response

Table 5 Response parameters

Parameter

Mandatory

Type

Description

is_success

Yes

Boolean

Indicates whether the request is successfully sent. Value true indicates that the request is successfully sent.

message

Yes

String

System prompt. If execution succeeds, the parameter setting may be left blank.

job_id

Yes

String

ID of a job returned after a job is generated and submitted by using SQL statements. The job ID can be used to query the job status and results.

job_type

Yes

String

Type of a job. Job types include the following:

  • DDL
  • DCL
  • IMPORT
  • EXPORT
  • QUERY
  • INSERT

schema

No

Array of objects

If the statement type is DDL, the column name and type of DDL are displayed.

rows

No

Array of objects

When the statement type is DDL, results of the DDL are displayed.

job_mode

No

String

Job execution mode. The options are as follows:

  • async: asynchronous
  • sync: synchronous

Example Request

{
    "currentdb": "db1",
    "sql": "desc table1",
    "queue_name": "default",
    "conf": [
        "dli.sql.shuffle.partitions = 200"
    ],
    "tags": [
            {
              "key": "workspace",
              "value": "space1"
             },
            {
              "key": "jobName",
              "value": "name1"
             }
      ]
}

Example Response

{
  "is_success": true,
  "message": "",
  "job_id": "8ecb0777-9c70-4529-9935-29ea0946039c",
  "job_type": "DDL",
  "job_mode":"sync",
  "schema": [
    {
      "col_name": "string"
    },
    {
      "data_type": "string"
    },
    {
      "comment": "string"
    }
  ],
  "rows": [
    [
      "c1",
      "int",
      null
    ],
    [
      "c2",
      "string",
      null
    ]
  ]
}

Status Codes

Table 6 describes the status code.

Table 6 Status codes

Status Code

Description

200

Submitted successfully.

400

Request error.

500

Internal service error.

Error Codes

If an error occurs when this API is invoked, the system does not return the result similar to the preceding example, but returns the error code and error information. For details, see Error Code.