Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
Help Center/ Data Lake Insight/ Developer Guide/ SQL Jobs/ Developing a DLI SQL Job in DataArts Studio

Developing a DLI SQL Job in DataArts Studio

Updated on 2025-02-21 GMT+08:00

Scenario

Huawei Cloud DataArts Studio provides a one-stop data governance platform that integrates with DLI for seamless data integration and development, enabling enterprises to manage and control their data effectively.

This section walks you through how to develop a DLI SQL job in DataArts Studio.

Development Process

Figure 1 Process of developing a DLI SQL job in DataArts Studio
  1. Prepare an environment: Prepare DLI and DataArts Studio resources required for job execution. See Prepare Environments.
  2. Create a database and table: Submit SQL scripts to create a database and table. See Step 1: Create a Database and Table.
  3. Import service data: Submit SQL scripts to import service data. See Step 2: Calculate and Process Service Data.
  4. Query and analyze data: Submit SQL scripts to analyze service data, for example, querying daily sales. See Step 3: Query and Analyze Sales Data.
  5. Orchestrate a job: Orchestrate data processing and analysis scripts into a pipeline. DataArts Studio executes all nodes based on the orchestrated pipeline sequence. See Step 4: Orchestrate a Job.
  6. Test job runs: Test if jobs can run properly. See Step 5: Test Job Running.
  7. Configure job scheduling and monitoring: Set job scheduling attributes and monitoring rules. See Step 6: Set Periodic Job Scheduling and Related Operations.

Prepare Environments

  • Prepare a DLI resource environment.
    • Configure a DLI job bucket.

      Before using DLI, you need to configure a DLI job bucket. The bucket is used to store temporary data generated during DLI job running, such as job logs and results.

      For details, see Configuring a DLI Job Bucket.

    • Create an elastic resource pool and create a SQL queue within it.

      An elastic resource pool offers compute resources (CPU and memory) required for running DLI jobs, which can adapt to the changing demands of services.

      You can create multiple queues within an elastic resource pool. These queues are associated with specific jobs and data processing tasks, and serve as the basic unit for resource allocation and usage within the pool. This means queues are specific compute resources required for executing jobs.

      Queues within an elastic resource pool can be shared to execute jobs. This is achieved by properly setting the queue allocation policy. This improves queue utilization.

      For details, see Creating an Elastic Resource Pool and Creating Queues Within It.

  • Prepare a DataArts Studio resource environment.
    • Buy a DataArts Studio instance.

      Buy a DataArts Studio instance before submitting a DLI job using DataArts Studio.

      For details, see Buying a DataArts Studio Basic Package.

    • Access the DataArts Studio instance's workspace.
      1. After buying a DataArts Studio instance, click Access.
        Figure 2 Accessing a DataArts Studio instance
      2. Click the Workspaces tab to access the data development page.

        By default, a workspace named default is created for the user who has purchased the DataArts Studio instance, and the user is assigned the administrator role. You can use the default workspace or create one.

        For how to create a workspace, see Creating and Managing a Workspace.

        Figure 3 Accessing the DataArts Studio instance's workspace
        Figure 4 Accessing DataArts Studio's data development page

Step 1: Create a Database and Table

  1. Develop SQL scripts for creating databases and tables.

    Databases and tables are the basis for developing SQL jobs. Before running a job, you need to define databases and tables based on your service scenarios.

    This part describes how to develop a SQL script to create databases and tables.

    1. In the left navigation pane of DataArts Factory, choose Data Development > Develop Script.
    2. In the Create Script area on the right, click + DLI SQL.
      Figure 5 Creating a DLI SQL script
    3. On the script editing page, enter sample code for creating a database and table.
      1
      2
      3
      4
      5
      6
      7
      8
      9
      ```SQL -- Create a database.
      CREATE DATABASE IF not EXISTS supermarket_db;-- Create product dimension table.
      CREATE TABLE IF not EXISTS supermarketdb.products ( productid INT,  productname STRING, category STRING,  price DECIMAL(10,2) ) using parquet;
      -- Create a transaction table. (productid INT, -- Product ID productname STRING, -- Product name category STRING, -- Product category price DECIMAL(10,2) -- Unit price) 
      CREATE TABLE IF not EXISTS supermarketdb.transactions (transactionid INT,  productid INT,  quantity INT,  dt STRING  ) using parquet partitioned by (dt);
      -- Create a sales analysis table. (transactionid INT, -- Transaction ID productid INT, -- Product ID quantity INT, -- Quantity dt STRING -- Date)
      CREATE TABLE IF not EXISTS supermarketdb.analyze (transactionid INT, productid INT, productname STRING,  quantity INT, dt STRING ) using parquet partitioned by (dt); 
      -- (transactionid INT, -- Transaction ID productid INT, -- Product ID productname STRING, -- Product name quantity INT, -- Quantity dt STRING -- Date)
      ```
      
    4. Click Save to save the SQL script. In this example, the script is named create_tables.
    5. Click Submit to run the script to create a database and table.

  2. Create a SQL job running script.

    1. In the left navigation pane of DataArts Factory, choose Data Development > Develop Job.
      Figure 6 Creating a job
    2. Click Create Job. In the dialog box that appears, edit job information. In this example, the SQL job is named job_create_tables.
      Figure 7 Editing job information
    3. On the job development page, drag the DLI SQL node to the canvas and click the node to edit its properties.

      For more property parameters, see Parameters of DLI SQL nodes.

      Figure 8 Editing the properties of a DLI SQL node
    4. Once the properties are edited, click Save to save the configuration.

  3. Configure job scheduling.

    As the database and table only need to be created once, only one-time scheduling is configured in this example.

    1. Left-click the blank area of the canvas.
    2. Click Scheduling Setup and select Run once. (The job will be scheduled only once and will not be automatically scheduled later.)
      Figure 9 Configuring job scheduling
    3. After configuring the scheduling, click Execute.

      Click Go to O&M Center to view the job status.

Step 2: Calculate and Process Service Data

  1. Develop a SQL script for importing service data.

    This part describes how to submit a SQL script to import service data.

    1. In the left navigation pane of DataArts Factory, choose Data Development > Develop Script.
    2. In the Create Script area on the right, click + DLI SQL.
      Figure 10 Creating a DLI SQL script
    3. On the script editing page, enter sample code for analyzing data.
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      SQL: Data in actual services is typically from other data sources. This example simplifies the data import logic and simulates the insertion of product data.
       INSERT INTO supermarketdb.products (productid, productname, category, price) VALUES
      (1001, 'Shampoo', 'Daily necessities', 39.90),
      (1002, 'Toothpaste', 'Daily necessities', 15.90)
      (1003, 'Instant noodles', 'Food', 4.50)
      (1004, 'Coke', 'Beverage', 3.50);
      -- Data in actual services is typically from other data sources. This example simplifies the data import logic and simulates the insertion of transaction records.
      INSERT INTO supermarketdb.transactions (transactionid, productid, quantity, dt) VALUES (1, 1001, 50, '2024-11-01'), -- 50 bottles of shampoo were sold.
      (2, 1002, 100, '2024-11-01'), -- 100 tubes of toothpaste were sold.
      (3, 1003, 30, '2024-11-02'), -- 30 packs of instant noodles were sold.
      (4, 1004, 24, '2024-11-02'); -- 24 bottles of Coke were sold.
      -- Simulate supermarket business analysis and query the transaction records of a certain product.
      INSERT INTO supermarketdb.analyze SELECT t.transactionid, t.productid, p.productname, t.quantity, t.dt
      FROM supermarketdb.transactions t 
      JOIN supermarketdb.products p ON t.productid = p.productid 
      WHERE t.dt = '2024-11-01';
      
    4. Click Save to save the SQL script. In this example, the script is named job_process_data.
    5. Click Submit to execute the script.

  2. Create a SQL job.

    1. In the left navigation pane of DataArts Factory, choose Data Development > Develop Job.
      Figure 11 Creating a job
    2. Click Create Job. In the dialog box that appears, edit job information. In this example, the SQL job is named job_process_data.
      Figure 12 Editing job information
    3. On the job development page, drag the DLI SQL node to the canvas and click the node to edit its properties.
      • SQL or Script: Select SQL script in this example. Then, select the script created in 1 for SQL script.
      • Database Name: Select the database configured in the SQL script.
      • Queue Name: Select the SQL queue created in Create an elastic resource pool and create a SQL queue within it.
      • Environment variable: The DLI environment variable is optional.

        The description of the parameter added in this example is as follows:

        spark.sql.optimizer.dynamicPartitionPruning.enabled = true

        • This parameter is used to control whether to enable dynamic partition pruning. Dynamic partition pruning can help reduce the amount of data that needs to be scanned and improve query performance when executing SQL queries.
        • When set to true, dynamic partition pruning is enabled. SQL automatically detects and deletes partitions that do not meet the WHERE clause conditions during query. This is useful for tables that have a large number of partitions.
        • If SQL queries contain a large number of nested left join operations and the table has a large number of dynamic partitions, a large number of memory resources may be consumed during data parsing. As a result, the memory of the driver node is insufficient and there are frequent Full GCs.
        • To avoid such issues, you can disable dynamic partition pruning by setting this parameter to false.

        However, disabling this optimization may reduce query performance. Once disabled, Spark does not automatically prune the partitions that do not meet the requirements.

      For more property parameters, see Parameters of DLI SQL nodes.

      Figure 13 Editing the properties of a DLI SQL node
    4. Once the properties are edited, click Save to save the configuration.

Step 3: Query and Analyze Sales Data

Develop a SQL script for analyzing and processing data.

This part describes how to submit a SQL script to analyze data.

  1. In the left navigation pane of DataArts Factory, choose Data Development > Develop Script.
  2. In the Create Script area on the right, click + DLI SQL.
    Figure 14 Creating a DLI SQL script
  3. On the script editing page, enter sample code for analyzing data.
    1
    2
    -- Query daily sales
    SELECT transaction_id, productid, productname, quantity, dt FROM supermarket_db.analyze WHERE dt = '2024-11-01';
    
  4. Click Save to save the SQL script. In this example, the script is named select_analyze_data.
  5. Click Submit to execute the script.

Step 4: Orchestrate a Job

  1. Create a DLI SQL node named select_analyze_data in the job_process_data job. Then, click the node to edit its properties.

    For more property parameters, see Parameters of DLI SQL nodes.

    Figure 15 Editing the properties of a DLI SQL node
  2. Once the properties are edited, click Save to save the configuration.
  3. Orchestrate the two nodes into a pipeline. DataArts Studio executes all nodes based on the orchestrated pipeline sequence. Then, click Save and Submit in the upper left corner.

Step 5: Test Job Running

After orchestrating the job, click Test to test it.

After testing the job, open the select_analyze_data SQL script file and click Execute to query and analyze sales details.

If query results meet your expectation, go to Step 6: Set Periodic Job Scheduling to set periodic job scheduling.

Figure 16 Running the select_analyze_data script

Step 6: Set Periodic Job Scheduling

  1. In the left navigation pane of DataArts Factory, choose Data Development > Develop Job.
  2. Double-click job_process_data.
  3. Click the Scheduling Setup tab on the right.
  4. Select Run periodically and set scheduling properties.

    In this example, the job's scheduling policy starts at 10:15:00 on November 22, 2024. The first scheduled time is 10:20:00 on the same day. With a daily scheduling interval, the job automatically runs at 10:20:00 a.m. each day, executing the nodes based on the orchestrated pipeline sequence.

    Figure 17 Configuring job scheduling
  5. Click Save, Submit, and Execute in sequence to complete periodic scheduling configuration.

For more job scheduling settings, see Setting Up Scheduling for a Job.

Related Operations

  • Configure job monitoring.

    DataArts Studio monitors the status of batch jobs.

    This type of job is a pipeline that consists of one or more nodes and is scheduled as a whole.

    In the left navigation pane of DataArts Factory, choose Monitoring > Job Monitoring. On the displayed Batch Jobs tab, view the scheduling status, scheduling interval, and scheduling start time of batch jobs.

    For details, see Monitoring a Batch Job.

    Figure 18 Configuring job monitoring
  • Configure instance monitoring.

    Each time a job is executed, a job instance record is generated.

    In the left navigation pane of DataArts Factory, choose Monitoring > Monitor Instance. On the instance monitoring page that appears, you can view job instance information and perform more operations on the instances as needed.

    For more information, see Instance Monitoring.

FAQ

  • If a DataArts Studio job fails, and the logs provided by DataArts Studio are not detailed enough, what should I do? Where can I find more specific logs?
    You can locate the DLI job ID through the logs provided by DataArts Studio, and then find the specific job in the DLI console using the DLI job ID.
    Figure 19 Monitoring log file

    Once you find the specific job in the DLI console, click archived logs to view detailed logs.

    Figure 20 Entering the job ID

    You can also search for the job in the DLI console using the node name or job name provided by DataArts Studio.

    Figure 21 Node name or job name
  • What should I do if I encounter permission errors when running complex DLI jobs?

    DLI needs to work with other cloud services. You must grant DLI basic operation permissions of these services so that DLI can access them and perform resource O&M operations on your behalf.

    For more information, see Configuring DLI Agency Permissions.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback