Help Center/ Data Lake Insight/ Developer Guide/ SQL Jobs/ Developing a DLI SQL Job in DataArts Studio

Updated on 2025-06-09 GMT+08:00

View PDF

Developing a DLI SQL Job in DataArts Studio

Scenario

Huawei Cloud DataArts Studio provides a one-stop data governance platform that integrates with DLI for seamless data integration and development, enabling enterprises to manage and control their data effectively.

This section walks you through how to develop a DLI SQL job in DataArts Studio.

Development Process

Figure 1 Process of developing a DLI SQL job in DataArts Studio

Prepare an environment: Prepare DLI and DataArts Studio resources required for job execution. See Prepare Environments.
Create a database and table: Submit SQL scripts to create a database and table. See Step 1: Create a Database and Table.
Import service data: Submit SQL scripts to import service data. See Step 2: Calculate and Process Service Data.
Query and analyze data: Submit SQL scripts to analyze service data, for example, querying daily sales. See Step 3: Query and Analyze Sales Data.
Orchestrate a job: Orchestrate data processing and analysis scripts into a pipeline. DataArts Studio executes all nodes based on the orchestrated pipeline sequence. See Step 4: Orchestrate a Job.
Test job runs: Test if jobs can run properly. See Step 5: Test Job Running.
Configure job scheduling and monitoring: Set job scheduling attributes and monitoring rules. See Step 6: Set Periodic Job Scheduling and Related Operations.

Prepare Environments

Prepare a DLI resource environment.
- Configure a DLI job bucket.
  Before using DLI, you need to configure a DLI job bucket. The bucket is used to store temporary data generated during DLI job running, such as job logs and results.
  
  For details, see Configuring a DLI Job Bucket.
- Create an elastic resource pool and create a SQL queue within it.
  An elastic resource pool offers compute resources (CPU and memory) required for running DLI jobs, which can adapt to the changing demands of services.
  
  You can create multiple queues within an elastic resource pool. These queues are associated with specific jobs and data processing tasks, and serve as the basic unit for resource allocation and usage within the pool. This means queues are specific compute resources required for executing jobs.
  
  Queues within an elastic resource pool can be shared to execute jobs. This is achieved by properly setting the queue allocation policy. This improves queue utilization.
  
  For details, see Creating an Elastic Resource Pool and Creating Queues Within It.
Prepare a DataArts Studio resource environment.
- Buy a DataArts Studio instance.
  Buy a DataArts Studio instance before submitting a DLI job using DataArts Studio.
  
  For details, see Buying a DataArts Studio Basic Package.
- Access the DataArts Studio instance's workspace.
  1. After buying a DataArts Studio instance, click Access.
    Figure 2 Accessing a DataArts Studio instance
  2. Click the Workspaces tab to access the data development page.
    By default, a workspace named default is created for the user who has purchased the DataArts Studio instance, and the user is assigned the administrator role. You can use the default workspace or create one.
    
    For how to create a workspace, see Creating and Managing a Workspace.
    
    Figure 3 Accessing the DataArts Studio instance's workspace
    
    Figure 4 Accessing DataArts Studio's data development page

Step 1: Create a Database and Table

Develop SQL scripts for creating databases and tables.

Databases and tables are the basis for developing SQL jobs. Before running a job, you need to define databases and tables based on your service scenarios.

This part describes how to develop a SQL script to create databases and tables.

In the left navigation pane of DataArts Factory, choose Data Development > Develop Script.
In the Create Script area on the right, click + DLI SQL.
Figure 5 Creating a DLI SQL script

On the script editing page, enter sample code for creating a database and table.

        
             ```SQL -- Create a database.
CREATE DATABASE IF not EXISTS supermarket_db;-- Create product dimension table.
CREATE TABLE IF not EXISTS supermarketdb.products ( productid INT,  productname STRING, category STRING,  price DECIMAL(10,2) ) using parquet;
-- Create a transaction table. (productid INT, -- Product ID productname STRING, -- Product name category STRING, -- Product category price DECIMAL(10,2) -- Unit price) 
CREATE TABLE IF not EXISTS supermarketdb.transactions (transactionid INT,  productid INT,  quantity INT,  dt STRING  ) using parquet partitioned by (dt);
-- Create a sales analysis table. (transactionid INT, -- Transaction ID productid INT, -- Product ID quantity INT, -- Quantity dt STRING -- Date)
CREATE TABLE IF not EXISTS supermarketdb.analyze (transactionid INT, productid INT, productname STRING,  quantity INT, dt STRING ) using parquet partitioned by (dt); 
-- (transactionid INT, -- Transaction ID productid INT, -- Product ID productname STRING, -- Product name quantity INT, -- Quantity dt STRING -- Date)
```

Click Save to save the SQL script. In this example, the script is named create_tables.
Click Submit to run the script to create a database and table.

Create a SQL job running script.
1. In the left navigation pane of DataArts Factory, choose Data Development > Develop Job.
  Figure 6 Creating a job
2. Click Create Job. In the dialog box that appears, edit job information. In this example, the SQL job is named job_create_tables.
  Figure 7 Editing job information
3. On the job development page, drag the DLI SQL node to the canvas and click the node to edit its properties.
  - SQL or Script: Select SQL script in this example. Then, select the script created in 2.b for SQL script.
  - Database Name: Select the database configured in the SQL script.
  - Queue Name: Select the SQL queue created in Create an elastic resource pool and create a SQL queue within it.
  For more property parameters, see Parameters of DLI SQL nodes.
  
  Figure 8 Editing the properties of a DLI SQL node
4. Once the properties are edited, click Save to save the configuration.
Configure job scheduling.

As the database and table only need to be created once, only one-time scheduling is configured in this example.
1. Left-click the blank area of the canvas.
2. Click Scheduling Setup and select Run once. (The job will be scheduled only once and will not be automatically scheduled later.)
  Figure 9 Configuring job scheduling
3. After configuring the scheduling, click Execute.
  Click Go to O&M Center to view the job status.

Step 2: Calculate and Process Service Data

Develop a SQL script for importing service data.

This part describes how to submit a SQL script to import service data.

In the left navigation pane of DataArts Factory, choose Data Development > Develop Script.
In the Create Script area on the right, click + DLI SQL.
Figure 10 Creating a DLI SQL script

On the script editing page, enter sample code for analyzing data.

        
         
           
           
             SQL: Data in actual services is typically from other data sources. This example simplifies the data import logic and simulates the insertion of product data.
 INSERT INTO supermarketdb.products (productid, productname, category, price) VALUES
(1001, 'Shampoo', 'Daily necessities', 39.90),
(1002, 'Toothpaste', 'Daily necessities', 15.90)
(1003, 'Instant noodles', 'Food', 4.50)
(1004, 'Coke', 'Beverage', 3.50);
-- Data in actual services is typically from other data sources. This example simplifies the data import logic and simulates the insertion of transaction records.
INSERT INTO supermarketdb.transactions (transactionid, productid, quantity, dt) VALUES (1, 1001, 50, '2024-11-01'), -- 50 bottles of shampoo were sold.
(2, 1002, 100, '2024-11-01'), -- 100 tubes of toothpaste were sold.
(3, 1003, 30, '2024-11-02'), -- 30 packs of instant noodles were sold.
(4, 1004, 24, '2024-11-02'); -- 24 bottles of Coke were sold.
-- Simulate supermarket business analysis and query the transaction records of a certain product.
INSERT INTO supermarketdb.analyze SELECT t.transactionid, t.productid, p.productname, t.quantity, t.dt
FROM supermarketdb.transactions t 
JOIN supermarketdb.products p ON t.productid = p.productid 
WHERE t.dt = '2024-11-01';

            

          

        
       

Click Save to save the SQL script. In this example, the script is named job_process_data.
Click Submit to execute the script.

Create a SQL job.
1. In the left navigation pane of DataArts Factory, choose Data Development > Develop Job.
  Figure 11 Creating a job
2. Click Create Job. In the dialog box that appears, edit job information. In this example, the SQL job is named job_process_data.
  Figure 12 Editing job information
3. On the job development page, drag the DLI SQL node to the canvas and click the node to edit its properties.
  - SQL or Script: Select SQL script in this example. Then, select the script created in 1 for SQL script.
  - Database Name: Select the database configured in the SQL script.
  - Queue Name: Select the SQL queue created in Create an elastic resource pool and create a SQL queue within it.
  - Environment variable: The DLI environment variable is optional.
    The description of the parameter added in this example is as follows:
    
    spark.sql.optimizer.dynamicPartitionPruning.enabled = true
    - This parameter is used to control whether to enable dynamic partition pruning. Dynamic partition pruning can help reduce the amount of data that needs to be scanned and improve query performance when executing SQL queries.
    - When set to true, dynamic partition pruning is enabled. SQL automatically detects and deletes partitions that do not meet the WHERE clause conditions during query. This is useful for tables that have a large number of partitions.
    - If SQL queries contain a large number of nested left join operations and the table has a large number of dynamic partitions, a large number of memory resources may be consumed during data parsing. As a result, the memory of the driver node is insufficient and there are frequent Full GCs.
    - To avoid such issues, you can disable dynamic partition pruning by setting this parameter to false.
    However, disabling this optimization may reduce query performance. Once disabled, Spark does not automatically prune the partitions that do not meet the requirements.
  For more property parameters, see Parameters of DLI SQL nodes.
  
  Figure 13 Editing the properties of a DLI SQL node
4. Once the properties are edited, click Save to save the configuration.

Step 3: Query and Analyze Sales Data

Develop a SQL script for analyzing and processing data.

This part describes how to submit a SQL script to analyze data.

In the left navigation pane of DataArts Factory, choose Data Development > Develop Script.
In the Create Script area on the right, click + DLI SQL.
Figure 14 Creating a DLI SQL script

On the script editing page, enter sample code for analyzing data.

      
           -- Query daily sales
SELECT transaction_id, productid, productname, quantity, dt FROM supermarket_db.analyze WHERE dt = '2024-11-01';

Click Save to save the SQL script. In this example, the script is named select_analyze_data.
Click Submit to execute the script.

Step 4: Orchestrate a Job

Create a DLI SQL node named select_analyze_data in the job_process_data job. Then, click the node to edit its properties.
- SQL or Script: Select SQL script in this example. Then, select the script created in 1 for SQL script.
- Database Name: Select the database configured in the SQL script.
- Queue Name: Select the SQL queue created in Create an elastic resource pool and create a SQL queue within it.
For more property parameters, see Parameters of DLI SQL nodes.

Figure 15 Editing the properties of a DLI SQL node
Once the properties are edited, click Save to save the configuration.
Orchestrate the two nodes into a pipeline. DataArts Studio executes all nodes based on the orchestrated pipeline sequence. Then, click Save and Submit in the upper left corner.

Step 5: Test Job Running

After orchestrating the job, click Test to test it.

After testing the job, open the select_analyze_data SQL script file and click Execute to query and analyze sales details.

If query results meet your expectation, go to Step 6: Set Periodic Job Scheduling to set periodic job scheduling.

Figure 16 Running the select_analyze_data script

Step 6: Set Periodic Job Scheduling

In the left navigation pane of DataArts Factory, choose Data Development > Develop Job.
Double-click job_process_data.
Click the Scheduling Setup tab on the right.
Select Run periodically and set scheduling properties.
In this example, the job's scheduling policy starts at 10:15:00 on November 22, 2024. The first scheduled time is 10:20:00 on the same day. With a daily scheduling interval, the job automatically runs at 10:20:00 a.m. each day, executing the nodes based on the orchestrated pipeline sequence.

Figure 17 Configuring job scheduling
Click Save, Submit, and Execute in sequence to complete periodic scheduling configuration.

For more job scheduling settings, see Setting Up Scheduling for a Job.

Related Operations

Configure job monitoring.
DataArts Studio monitors the status of batch jobs.

This type of job is a pipeline that consists of one or more nodes and is scheduled as a whole.

In the left navigation pane of DataArts Factory, choose Monitoring > Job Monitoring. On the displayed Batch Jobs tab, view the scheduling status, scheduling interval, and scheduling start time of batch jobs.

For details, see Monitoring a Batch Job.

Figure 18 Configuring job monitoring
Configure instance monitoring.
Each time a job is executed, a job instance record is generated.

In the left navigation pane of DataArts Factory, choose Monitoring > Monitor Instance. On the instance monitoring page that appears, you can view job instance information and perform more operations on the instances as needed.

For more information, see Instance Monitoring.

FAQ

If a DataArts Studio job fails, and the logs provided by DataArts Studio are not detailed enough, what should I do? Where can I find more specific logs?
You can locate the DLI job ID through the logs provided by DataArts Studio, and then find the specific job in the DLI console using the DLI job ID.
Figure 19 Monitoring log file

Once you find the specific job in the DLI console, click archived logs to view detailed logs.

Figure 20 Entering the job ID

You can also search for the job in the DLI console using the node name or job name provided by DataArts Studio.

Figure 21 Node name or job name
What should I do if I encounter permission errors when running complex DLI jobs?
DLI needs to work with other cloud services. You must grant DLI basic operation permissions of these services so that DLI can access them and perform resource O&M operations on your behalf.

For more information, see Configuring DLI Agency Permissions.