このページは、お客様の言語ではご利用いただけません。Huawei Cloudは、より多くの言語バージョンを追加するために懸命に取り組んでいます。ご協力ありがとうございました。

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
Help Center/ Data Lake Insight/ Getting Started/ Using DLI to Submit a SQL Job to Query OBS Data

Using DLI to Submit a SQL Job to Query OBS Data

Updated on 2025-01-10 GMT+08:00

Scenario

DLI can query data stored in OBS. This section describes how to use DLI to submit a SQL job to query OBS data.

To illustrate, create a new file called sampledata.csv and upload it to an OBS bucket. Then, create an elastic resource pool, create queues within it, and use DLI to create a database and table. Finally, use DLI's SQL editor to query 1,000 data records from the table.

Procedure

Table 1 shows the process for submitting a SQL job to query OBS data.

Complete the preparations in Preparations before performing the following operations.

Table 1 Procedure for using DLI to submit a SQL job to query OBS data

Procedure

Description

Step 1: Upload Data to OBS

Upload data files to OBS.

Step 2: Create an Elastic Resource Pool and Add Queues to the Pool

Create compute resources required for submitting jobs.

Step 3: Create a Database

DLI metadata forms the foundation for SQL job development. Before executing a job, you need to define databases and tables based on your business scenario.

Step 4: Create a Table

Use the sample data stored in OBS to create tables in the db1 database.

Step 5: Query Data

Use standard SQL statements to query and analyze data.

Preparations

  • Register a Huawei ID and enable Huawei Cloud services. Make sure your account is not in arrears or frozen.
  • Configure an agency for DLI.
    To use DLI, you need to access services such as Object Storage Service (OBS), Virtual Private Cloud (VPC), and Simple Message Notification (SMN). If it is your first time using DLI, you will need to configure an agency to allow access to these dependent services.
    1. Log in to the DLI management console using your account. In the navigation pane on the left, choose Global Configuration > Service Authorization.
    2. On the agency settings page, select the agency permissions under Basic Usage, Datasource, and O&M and click Update.
    3. View and understand the notes for updating the agency and click OK. The DLI agency permissions are updated.
      Figure 1 Configuring an agency for DLI
    4. Once configured, you can check the agency dli_management_agency in the agency list on the IAM console.

Step 1: Upload Data to OBS

Upload data files to OBS.

  1. Log in to the OBS management console.
  2. Create a bucket. In this example, the bucket name is obs1.
    1. Click Create Bucket in the upper right corner.
    2. On the displayed Create Bucket page, specify Region and enter the Bucket Name. Retain the default values for other parameters or adjust them as needed.
      NOTE:

      Select a region that matches the location of the DLI console.

    3. Click Create Now.
  3. Click obs1 to access its Objects tab page.
  4. Click Upload Object. In the displayed dialog box, drag a desired file or folder, for example, sampledata.csv to the Upload Object area. Then, click Upload.

    You can create a sampledata.txt file, copy the following content separated by commas (,), and save the file as sampledata.csv.

    product_id,product_name
    113,office_13
    22,book_2
    29,book_9

    After the file is uploaded successfully, the file path is obs://obs1/sampledata.csv.

    For more operations on the OBS console, see the Object Storage Service User Guide.

Step 2: Create an Elastic Resource Pool and Add Queues to the Pool

In this example, the elastic resource pool dli_resource_pool and queue dli_queue_01 are created.
  1. Log in to the DLI management console.
  2. In the navigation pane on the left, choose Resources > Resource Pool.
  3. On the displayed page, click Buy Resource Pool in the upper right corner.
  4. On the displayed page, set the parameters.
    In this example, we will buy the resource pool in the CN East-Shanghai2 region. Table 2 describes the parameters.
    Table 2 Parameters

    Parameter

    Description

    Example Value

    Region

    Select a region where you want to buy the elastic resource pool.

    CN East-Shanghai2

    Project

    Project uniquely preset by the system for each region

    Default

    Name

    Name of the elastic resource pool

    dli_resource_pool

    Specifications

    Specifications of the elastic resource pool

    Standard

    CU Range

    The maximum and minimum CUs allowed for the elastic resource pool

    64-64

    CIDR Block

    CIDR block the elastic resource pool belongs to. If you use an enhanced datasource connection, this CIDR block cannot overlap that of the data source. Once set, this CIDR block cannot be changed.

    172.16.0.0/19

    Enterprise Project

    Select an enterprise project for the elastic resource pool.

    default

  5. Click Buy.
  6. Click Submit.
  7. In the elastic resource pool list, locate the pool you just created and click Add Queue in the Operation column.
  8. Set the basic parameters listed below.
    Table 3 Basic parameters for adding a queue

    Parameter

    Description

    Example Value

    Name

    Name of the queue to add

    dli_queue_01

    Type

    Type of the queue

    • To execute SQL jobs, select For SQL.
    • To execute Flink or Spark jobs, select For general purpose.

    _

    Engine

    SQL queue engine. The options are Spark and HetuEngine.

    _

    Enterprise Project

    Select an enterprise project.

    default

  9. Click Next and configure scaling policies for the queue.

    Click Create to add a scaling policy with varying priority, period, minimum CUs, and maximum CUs.

    Figure 2 shows the scaling policy configured in this example.
    Figure 2 Configuring a scaling policy when adding a queue
    Table 4 Scaling policy parameters

    Parameter

    Description

    Example Value

    Priority

    Priority of the scaling policy in the current elastic resource pool. A larger value indicates a higher priority. In this example, only one scaling policy is configured, so its priority is set to 1 by default.

    1

    Period

    The first scaling policy is the default policy, and its Period parameter configuration cannot be deleted or modified.

    The period for the scaling policy is from 00 to 24.

    00–24

    Min CU

    Minimum number of CUs allowed by the scaling policy

    16

    Max CU

    Maximum number of CUs allowed by the scaling policy

    64

  10. Click OK.

Step 3: Create a Database

Before querying data, create a database, for example, db1.

NOTE:

The default database is a built-in database. You cannot create the default. database.

  1. In the left navigation pane of the DLI management console, choose SQL Editor.
  2. In the editing window on the right of the SQL Editor page, enter the following SQL statement and click Execute. Read and agree to the privacy agreement, and click OK.
    create database db1;

    After the database is successfully created, click in the middle pane to refresh the database list. The new database db1 is displayed in the list.

    NOTE:

    When you execute a query on the DLI management console for the first time, you need to read the privacy agreement. You can perform operations only after you agree to the agreement. For later queries, you will not need to read the privacy agreement again.

Step 4: Create a Table

After database db1 is created, create a table (for example, table1) containing data in the sample file obs://obs1/sampledata.csv stored on OBS in db1.

  1. In the SQL editing window of the SQL Editor page, select the default queue and database db1.
  2. Enter the following SQL statement in the job editor window and click Execute:
    create table table1 (product_id int, product_name string) using csv options (path 'obs://obs1');

    When creating a table, you only need to specify the OBS storage path where the data file is located, without specifying the file name at the end of the directory.

    After the table is successfully created, click the Databases tab then db1. The created table table1 is displayed in the table list.

Step 5: Query Data

After performing the preceding steps, you can start querying data.

  1. In the Table tab on the SQL Editor page, double-click the created table table1. The SQL statement is automatically displayed in the SQL job editing window in the right pane. Run following statement to query 1,000 records in the table1 table:
    select * from db1.table1 limit 1000;
  2. Click Execute. The system starts the query.

    After the SQL statement is successfully executed or fails to be executed, you can view the query result on the View Result tab under the SQL job editing window.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback