Using DataArtsFabric SQL to Import and Query Data
Operation Scenarios
DataArtsFabric SQL is a cloud-native serverless version. It utilizes the resource pooling and massive storage capabilities provided by the cloud infrastructure, combined with parallel execution, metadata decoupling, and compute-storage persistent decoupling architecture, to achieve ultimate elasticity and lakehouse features.
This section describes how to quickly enable the DataArtsFabric SQL service and perform simple data queries.
Operation Process
Step |
Description |
---|---|
Sign up for a HUAWEI ID and enable Huawei Cloud services, complete real-name authentication, top up your account, enable LakeFormation and OBS permissions, and confirm the agency. |
|
Creating a SQL endpoint |
Create a SQL endpoint. You can skip this step if you use a public endpoint. |
Planning and Creating an OBS Parallel File System and Importing Data |
Create an OBS parallel file system and folders for data storage and import sample data. |
Planning and Creating a LakeFormation Instance, Catalog, Database, and Table |
Create catalogs, databases, and tables on the LakeFormation page and specify the OBS parallel file system directory. |
Query data on the SQL Editor page of DataArtsFabric SQL. |
Prerequisites
- You have signed up for a HUAWEI ID, completed real-name authentication, and checked your account is not in arrears or frozen.
- You have enabled LakeFormation and OBS permissions and confirmed the agency.
Using DataArtsFabric SQL
- Log in to the Huawei Cloud DataArts Fabric console and click Access Workspace.
- In the navigation pane on the left, choose Development and Production > SQL Editor. Select a LakeFormation instance, a LakeFormation catalog, and a SQL endpoint to run the SQL statements.
Planning and Creating an OBS Parallel File System and Importing Data
DataArtsFabric SQL uses OBS to store data. You need to create a parallel file system and folders on the OBS console and import sample data.
- Log in to the management console.
- In the upper left corner of the page, click
and choose Storage > Object Storage Service.
- In the navigation pane on the left, choose Parallel File System > Create Parallel File System. On the displayed slide-out panel/dialog box, set parameters and click Create Now.
- Set File System Name as required, for example, to fabric-serverless.
- Set other parameters based on site requirements.
- On the Parallel File System page, click the name of the created file system, for example, fabric-serverless.
- Choose Files in the navigation pane on the left. On the displayed page, click Create Folder, enter a folder name, and click OK. Click the folder name and click Create Folder to create a subfolder.
- Repeat this step to create paths for storing metadata in sequence. The following paths are examples:
- Catalog storage path: fabric-serverless/catalog1
- Database storage path: fabric-serverless/catalog1/database1
- Data table storage path: fabric-serverless/catalog1/database1/table1
- Download the Parquet data sample file.
- Upload the data file to the fabric-serverless/catalog1/database1/table1 directory.
Planning and Creating a LakeFormation Instance, Catalog, Database, and Table
DataArtsFabric SQL manages data sources using LakeFormation. You need to purchase a LakeFormation instance and configure its catalog, database, and table information.
- Log in to the management console.
- In the upper left corner of the page, choose Analytics > DataArts Lake Formation.
- On the OverView page, purchase an instance.
- In the upper left corner, select the instance to display its details.
- Create a catalog.
- In the navigation pane on the left, choose Metadata > Catalog.
- Click Create, set the parameters below, and click Submit.
- Catalog Name: catalog1
- Select Location: Click
, select a storage location, for example, obs://fabric-serverless/catalog1, and click OK.
- Catalog Type: DEFAULT
- Retain the default settings for other parameters.
- After the catalog is created, you can view the catalog information on the Catalog page.
- Create a database.
- In the navigation pane on the left, choose Metadata > Database.
- Select catalog1 from the drop-down list box next to Catalog in the upper right corner.
If the database named default already exists, skip this step.
- Click Create, set the parameters below, and click Submit.
- Database Name: database1
- Catalog: catalog1
- Select Location: Click
, select a location, for example, obs://fabric-serverless/catalog1/database1, and click OK.
- Retain the default settings for other parameters.
- After the database is created, you can view the database information on the Database page.
- Create a table.
- Choose Metadata > Table.
- Click Create, set the parameters below, and click Submit.
- Table Name: table1
- Catalog: catalog1
- Database: database1
- Data Storage Location: Click
, select a location for storing the table in the OBS parallel file system, for example, obs://fabric-serverless/catalog1/database1/table1, and click OK.
- Data Source Format: Parquet.
- Table Field: Click Add to set relevant fields. The following table lists the table fields corresponding to the sample data.
1 2 3 4 5 6 7
Name Type Length/Setting order_id varchar 12 order_channel varchar 32 order_time timestamp cust_code varchar 6 pay_amount double real_pay double
- Set other parameters based on site requirements.
- After the table is created, you can view the table information on the Table page.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot