Creating and Submitting a SQL Job

Introduction

DLI offers a SQL editor for executing data query operations using SQL statements.

DLI's SQL editor supports SQL:2003, is compatible with SparkSQL, and allows batch execution of SQL statements. Additionally, commonly used syntax in the job editing window is highlighted in various colors. Both single-line and multi-line comments are supported (starting with "--", followed by the comment). For more detailed syntax descriptions, refer to Data Lake Insight SQL Syntax Reference.

This section describes how to create and submit a SQL job using the DLI SQL editor.

Notes

Before submitting a SQL job, configure a DLI job bucket. The bucket is used to store job results and temporary data generated by DLI, such as job logs.
- For details about how to configure a job bucket, see Configuring a DLI Job Bucket. The job bucket name is set by default.
- If your SQL queue has been configured to save job results to a DLI job bucket, the system will write the results directly to the specified OBS bucket when executing a SQL job. Ensure the following preparations are completed before submitting the job:
  - Configure DLI job bucket information before submitting a SQL job. Otherwise, the SQL job may fail to be submitted.
  - Ensure that the user executing the SQL job has read and write permissions for the DLI job bucket. Otherwise, the job results cannot be properly saved or retrieved.
    When executing a job, the system accesses the job bucket using the identity credentials of the user who submitted the job. If permissions are insufficient, it results in failure to save or retrieve the job results.
  For details, refer to How Do I Check if Job Result Saving to a DLI Job Bucket Is Enabled for a SQL Queue?
- On the OBS management console, you can configure lifecycle rules for a bucket to automatically delete objects within it or change object storage classes on a regular basis. For details, see Configuring a Lifecycle Rule.

Creating and Submitting a SQL Job Using the SQL Editor

Log in to the DLI management console. In the navigation pane on the left, choose SQL Editor.
On the SQL Editor page, the system prompts you to create a temporary DLI data bucket to store temporary data generated by DLI. For details about how to configure a job bucket, see Configuring a DLI Job Bucket.

Above the SQL job editing window, set the parameters required for running a SQL job, such as the queue and database. For how to set the parameters, refer to Table 1.

**Table 1** Setting SQL job parameters
Button & Drop-Down List	Description
Engine	SQL jobs support the Spark and HetuEngine engines. Spark is suitable for offline analysis. HetuEngine is suitable for interactive analysis. For more information about DLI engines, see DLI Compute Engines.
Queues	Resource queue used to execute SQL jobs. A queue determines the compute resources accessible to a job during its operation within an elastic resource pool. Every queue is allocated with specific resources, known as CUs, whose configuration significantly impacts the job's performance and execution efficiency. Before submitting a job, assess its resource needs and select an appropriate queue. SQL jobs can only be executed on SQL queues. If no queue is available, you can create a queue or use the default queue. For details about how to create a queue, see Creating an Elastic Resource Pool and Creating Queues Within It. The default queue serves well for temporary or testing scenarios characterized by indeterminate data volumes or infrequent data processing needs.
Catalog	A data catalog is a metadata management object that can contain multiple databases. For more information about data catalogs, see Understanding Data Catalogs, Databases, and Tables. You can create and manage multiple catalogs in DLI to isolate different metadata.
Databases	Select a database from the drop-down list box. If no database is available, the default database is displayed. For how to create a database, see Creating a Data Catalog, Database, and Table on the DLI Console. If you have specified a database where tables are located in SQL statements, the database you choose here does not apply.
Settings	Add parameters and tags. Parameter Settings: Set parameters in key/value format for SQL jobs. For details, see Data Lake Insight SQL Syntax Reference. Tags: Assign key-value pairs as tags to a SQL job.

Create a database and a table.
Create them in advance by referring to Creating a Data Catalog, Database, and Table on the DLI Console. For example, create a table named qw.
In the SQL job editing window, enter the following SQL statement:
1

SELECT * FROM qw.qw LIMIT 10;
Alternatively, you can double-click the table name qw. The query statement is automatically entered in the SQL job editing window.

DLI offers a range of SQL templates that come with use cases, code examples, and usage guides. You can also use these templates to quickly implement your service logic. For more information about templates, see Creating a SQL Job Template.
On top of the editing window, click More > Verify Syntax to check whether the SQL statement is correct.
1. If the verification fails, check the SQL statement syntax by referring to Data Lake Insight SQL Syntax Reference.
2. If the syntax verification is successful, click Execute. Read and agree to the privacy agreement. Click OK to execute the SQL statement.
Once successfully executed, you can check the execution result on the View Result tab below the SQL job editing window.
View job execution results.
On the View Result tab, click to display execution results in a chart. Click to switch back to the table view.
You can view a maximum of 1,000 data records on this View Result tab. To view more or full data, click to export the data to OBS.
- If no column of the numeric type is displayed in the execution result, the result cannot be represented in charts.
- You can view the data in a bar chart, line chart, or fan chart.
- In the bar chart and line chart, the X axis can be any column, while the Y axis can only be columns of the numeric type. The fan chart displays the corresponding legends and indicators.

Setting SQL Job Parameters

Click Settings in the upper right corner of the SQL Editor page. You can set parameters and tags for the SQL job.

Parameter Settings: Assign key-value pairs as parameter settings.

For details, see Data Lake Insight SQL Syntax Reference.
Tags: Assign key-value pairs as tags to a SQL job.

**Table 2** Parameters for SQL job running
Parameter	Default Value	Description
spark.sql.files.maxRecordsPerFile	0	Maximum number of records to be written into a single file. If the value is zero or negative, there is no limit.
spark.sql.autoBroadcastJoinThreshold	209715200	Maximum size, in bytes, of the table that displays all working nodes when a connection is executed. You can set this parameter to -1 to disable the display. NOTE: Currently, only configuration units that store tables analyzed using the ANALYZE TABLE COMPUTE statistics noscan command and file-based data source tables that calculate statistics directly from data files are supported.
spark.sql.shuffle.partitions	200	Default number of partitions used to filter data for join or aggregation.
spark.sql.dynamicPartitionOverwrite.enabled	false	When set to false, DLI will delete all partitions that meet the conditions before overwriting them. For example, if there is a partition named 2021-01 in a partitioned table and you use the INSERT OVERWRITE statement to write data to the 2021-02 partition, the data in the 2021-01 partition will also be overwritten. When set to true, DLI will not delete partitions in advance, but will overwrite partitions with data written during runtime.
spark.sql.files.maxPartitionBytes	134217728	Maximum number of bytes to be packed into a single partition when a file is read.
spark.sql.badRecordsPath	-	Path of bad records.
dli.sql.sqlasync.enabled	true	Whether DDL and DCL statements are executed asynchronously. The value true indicates that asynchronous execution is enabled.
dli.sql.job.timeout	-	Job running timeout interval, in seconds. If the job times out, it will be canceled.

More Common Functions of the SQL Editor

Switching to the SparkUI page to view the SQL statement execution process
The SQL editor allows you to switch to the SparkUI to view the SQL statement execution process.
- You can view only the latest 100 job records on DLI's SparkUI.
- If a job is running on the default queue or is a synchronization one, you cannot switch to the SparkUI to view the SQL statement execution process.
When you execute a job on a created queue, the cluster is restarted. It takes about 10 minutes. If you click SparkUI before the cluster is created, an empty projectID will be cached. The SparkUI page cannot be displayed. You are advised to use a dedicated queue so that the cluster will not be released. Alternatively, wait for a while after the job is submitted (the cluster is created), and then check SparkUI.
Archiving SQL run logs
On the Executed Queries (Last Day) tab of the SQL Editor page, click More and select View Log in the Operation column of the SQL job. The system automatically switches to the OBS path where logs are stored. You can download logs as needed.

The View Log button is not available for synchronization jobs and jobs running on the default queue.

SQL Editor shortcuts

**Table 3** Keyboard shortcuts
Shortcut	Description
Ctrl+Enter	Execute SQL statements. You can run SQL statements by pressing Ctrl+R or Ctrl + Enter on the keyboard.
Ctrl+F	Search for SQL statements. You can press Ctrl+F to search for a required SQL statement.
Shift+Alt+F	Format SQL statements. You can press Shift + Alt + F to format a SQL statement.
Ctrl+Q	Syntax verification. You can press Ctrl + Q to verify the syntax of SQL statements.
F11	Full screen. You can press F11 to display the SQL Job Editor window in full screen. Press F11 again to leave the full screen.