Using DataArtsFabric SQL to Import and Query Data

Operation Scenarios

DataArtsFabric SQL is a cloud-native serverless version. It utilizes the resource pooling and massive storage capabilities provided by the cloud infrastructure, combined with parallel execution, metadata decoupling, and compute-storage persistent decoupling architecture, to achieve ultimate elasticity and lakehouse features.

This section describes how to quickly enable the DataArtsFabric SQL service and perform simple data queries.

Operation Process

**Table 1** Operation process
Step	Description
Prerequisites	Sign up for a HUAWEI ID and enable Huawei Cloud services, complete real-name authentication, top up your account, enable LakeFormation and OBS permissions, and confirm the agency.
Creating a SQL endpoint	Create a SQL endpoint. You can skip this step if you use a public endpoint.
Planning and Creating an OBS Parallel File System and Importing Data	Create an OBS parallel file system and folders for data storage and import sample data.
Planning and Creating a LakeFormation Instance, Catalog, Database, and Table	Create catalogs, databases, and tables on the LakeFormation page and specify the OBS parallel file system directory.
Querying Data	Query data on the SQL Editor page of DataArtsFabric SQL.

Prerequisites

You have signed up for a HUAWEI ID, completed real-name authentication, and checked your account is not in arrears or frozen.
You have enabled LakeFormation and OBS permissions and confirmed the agency.

Using DataArtsFabric SQL

Log in to the Huawei Cloud DataArts Fabric console and click Access Workspace.
In the navigation pane on the left, choose Development and Production > SQL Editor. Select a LakeFormation instance, a LakeFormation catalog, and a SQL endpoint to run the SQL statements.

Planning and Creating an OBS Parallel File System and Importing Data

DataArtsFabric SQL uses OBS to store data. You need to create a parallel file system and folders on the OBS console and import sample data.

Log in to the management console.
In the upper left corner of the page, click and choose Storage > Object Storage Service.
In the navigation pane on the left, choose Parallel File System > Create Parallel File System. On the displayed slide-out panel/dialog box, set parameters and click Create Now.
- Set File System Name as required, for example, to fabric-serverless.
- Set other parameters based on site requirements.
On the Parallel File System page, click the name of the created file system, for example, fabric-serverless.
Choose Files in the navigation pane on the left. On the displayed page, click Create Folder, enter a folder name, and click OK. Click the folder name and click Create Folder to create a subfolder.
Repeat this step to create paths for storing metadata in sequence. The following paths are examples:
- Catalog storage path: fabric-serverless/catalog1
- Database storage path: fabric-serverless/catalog1/database1
- Data table storage path: fabric-serverless/catalog1/database1/table1
Download the Parquet data sample file.
Upload the data file to the fabric-serverless/catalog1/database1/table1 directory.

Planning and Creating a LakeFormation Instance, Catalog, Database, and Table

DataArtsFabric SQL manages data sources using LakeFormation. You need to purchase a LakeFormation instance and configure its catalog, database, and table information.

Log in to the management console.
In the upper left corner of the page, choose Analytics > DataArts Lake Formation.
On the OverView page, purchase an instance.
In the upper left corner, select the instance to display its details.
Create a catalog.
1. In the navigation pane on the left, choose Metadata > Catalog.
2. Click Create, set the parameters below, and click Submit.
  - Catalog Name: catalog1
  - Select Location: Click , select a storage location, for example, obs://fabric-serverless/catalog1, and click OK.
  - Catalog Type: DEFAULT
  - Retain the default settings for other parameters.
3. After the catalog is created, you can view the catalog information on the Catalog page.
Create a database.
1. In the navigation pane on the left, choose Metadata > Database.
2. Select catalog1 from the drop-down list box next to Catalog in the upper right corner.
  If the database named default already exists, skip this step.
3. Click Create, set the parameters below, and click Submit.
  - Database Name: database1
  - Catalog: catalog1
  - Select Location: Click , select a location, for example, obs://fabric-serverless/catalog1/database1, and click OK.
  - Retain the default settings for other parameters.
4. After the database is created, you can view the database information on the Database page.

Create a table.

Choose Metadata > Table.

Click Create, set the parameters below, and click Submit.

Table Name: table1
Catalog: catalog1
Database: database1
Data Storage Location: Click , select a location for storing the table in the OBS parallel file system, for example, obs://fabric-serverless/catalog1/database1/table1, and click OK.
Data Source Format: Parquet.

Table Field: Click Add to set relevant fields. The following table lists the table fields corresponding to the sample data.

          
               Name           Type           Length/Setting
order_id        varchar           12
order_channel   varchar           32
order_time      timestamp
cust_code       varchar           6
pay_amount      double
real_pay        double

Set other parameters based on site requirements.

After the table is created, you can view the table information on the Table page.

Querying Data

Return to the DataArtsFabric management console.
In the navigation pane on the left, choose SQL Editor.
Choose Instance > Catalog.
Run the following SQL statement to query data:
1

SELECT * FROM database1.table1;