Creating a DataArts Lake Formation Instance and Planning Metadata
Scenario
This document provides step-by-step instructions for creating a DataArts Lake Formation (LakeFormation) instance from scratch and setting up catalogs along with internal databases, tables, and other metadata within the instance.
LakeFormation allows you to create, modify, check, and delete catalogs, databases, and data tables. It facilitates easy initialization and ongoing operations of your data lake and provides centralized and unified management of all metadata under the LakeFormation instance, thereby accelerating the planning and deployment of data lake services.
Procedure
Before you start, complete the operations described in Preparations. Then, follow these steps:
- Create a LakeFormation Instance: Create an exclusive LakeFormation instance.
- Create an OBS Path for Storing Metadata: Create an OBS path for storing metadata.
- Create a Catalog: Create a catalog named catalog1.
- Create a Database: Create a database named database1 in catalog catalog1.
- Create a Data Table: Create a data table named table_A in database database1.
Preparations
- Sign up for a HUAWEI ID and complete real-name authentication.
Before creating a LakeFormation instance, sign up for a HUAWEI ID and enable Huawei Cloud services and complete real-name authentication.
If you already have enabled Huawei Cloud services and completed real-name authentication, skip this step.
- You have prepared an IAM user who has the permission to create LakeFormation instances. For details, see Creating an IAM User and Granting LakeFormation Permissions.
Step 1: Create a LakeFormation Instance
- Log in to the management console as the user prepared in Preparations.
- In the upper left corner, click
and choose Analytics > LakeFormation to access the LakeFormation console.
- On the displayed page, select the checkbox next to I have read and agree with the LakeFormation Service Statement. and click Authorize.
If authorization has been completed, skip this step.
- Click Buy Now or Buy Instance in the upper right corner of the Overview page.
If a LakeFormation instance exists on the page, Buy Instance is displayed. Otherwise, Buy Now is displayed.
- Set the parameters listed below.
Table 1 Parameters for creating a LakeFormation instance Parameter
Example Value
Description
Type
Exclusive
Select an instance type.
- Shared
- Exclusive
Billing Mode
Pay-per-use
Billing mode of the instance.
Project
xxx
Select the project the instance belongs to.
Name
lakeformation-test
Name of the LakeFormation instance.
QPS
10000
Maximum number of requests per second. You do not need to set this parameter when Type is set to Shared.
Enterprise Project
xxx
Enterprise project the cluster belongs to. If there is no enterprise project available, click Create to create one.
Description
-
Description of the instance.
Label
-
Enter a tag key and value and click Add.
- Click Buy Now, confirm the configuration, and pay.
- Click Back to Console. You can check information about the newly created LakeFormation instance on the console.
Pay attention to the quota notification when creating an instance. If the resource quota is insufficient, apply for sufficient resources as prompted and then create an instance.
Wait until the instance status changes to Running.
Step 2: Create an OBS Path for Storing Metadata
- Log in to the LakeFormation console.
- Click
in the upper left corner of the page and choose Storage > Object Storage Service to access the Object Storage Service console.
- Click Parallel File Systems and click Create Parallel File System. On the displayed page, set the parameters, and click Create Now.
- File System Name: Set the name of the parallel file system as required, for example, lakeformation-test.
- Set other parameters based on the site requirements.
- On the Parallel File Systems page, click the name of the created file system, that is lakeformation-test.
- Click Files in the navigation pane, click Create Folder, enter a folder name, and click OK. Click the folder name and click Create Folder to create a subfolder.
Repeat this step to create paths for storing metadata in sequence. The following paths are examples:
- Catalog storage path: lakeformation-test/catalog1
- Database storage path: lakeformation-test/catalog1/database1
- Table storage path: lakeformation-test/catalog1/database1/table1 and lakeformation-test/catalog1/database1/table2
- Function storage path: lakeformation-test/catalog1/database1/udf1
Step 3: Create a Catalog
- Log in to the LakeFormation console.
- In the upper left corner, click
and choose Analytics > LakeFormation to access the LakeFormation console.
- From the drop-down list box on the left, select the LakeFormation instance you have created, for example, lakeformation-test. Choose Metadata > Catalog in the navigation pane on the left.
- On the displayed Catalog page, click Create. Set parameters by referring to the table below, retain the default values for other parameters, and click Submit.
Table 2 Parameters for creating a catalog Parameter
Example Value
Description
Catalog Name
catalog1
Name of the catalog to be created.
The value can contain up to 256 characters. Only letters, numbers, and underscores (_) are allowed.
Catalog Type
DEFAULT
Select a catalog type.
Select Location
obs://lakeformation-test/catalog1
(Optional) Location where catalog data is stored in OBS.
Click
, select Parallel file system or Object storage bucket for Buckets, select a location, and click OK.
- The location you specify must start with obs:// and must include a storage object. For example, select obs://lakeformation-test/catalog1. If there is no appropriate OBS path available, click go to OBS to create one and follow Step 2: Create an OBS Path for Storing Metadata to create it.
- To prevent data conflicts, the path cannot be the metadata storage path that is being used by other LakeFormation instances.
- You are advised to select a folder that is not selected by other catalogs.
Description
xxx
Description of the catalog to be created.
- After the catalog is created, you can check its information on the Catalog page.
Step 4: Create a Database
- Log in to the LakeFormation console.
- In the upper left corner, click
and choose Analytics > LakeFormation to access the LakeFormation console.
- From the drop-down list box on the left, select the LakeFormation instance you have created, for example, lakeformation-test. Choose Metadata > Database in the navigation pane on the left.
- On the displayed Database page, select the catalog you have created from the Catalog drop-down list box in the upper right corner, for example, catalog1.
- Click Create. Set parameters by referring to the table below, retain the default values for other parameters, and click Submit.
Table 3 Parameters for creating a database Parameter
Example Value
Description
Database Name
database1
Enter a name for the database to be created.
The value can contain up to 128 characters. Only letters, numbers, and underscores (_) are allowed.
Catalog
catalog1
Catalog the database to be created belongs to.
Select Location
obs://lakeformation-test/catalog1/database1
Location where database information is stored in OBS.
Click
, select Parallel file system or Object storage bucket for Buckets, select a location, and click OK.
- The location you specify must start with obs:// and must include a storage object. For example, select obs://lakeformation-test/catalog1/database1. If there is no appropriate OBS path available, click go to OBS to create one and follow Step 2: Create an OBS Path for Storing Metadata to create it.
- The path must differ from the storage path of the associated catalog (that is, the Select Location parameter configured during catalog creation).
- To prevent data conflicts, the path cannot be the metadata storage path that is being used by other LakeFormation instances.
- If Database Storage Locations is set for the catalog the database belongs to, set this parameter to a subpath of Database Storage Locations or Select Location of the catalog.
Description
xxx
Description of the database to be created.
- After the database is created, you can check its information on the Database page.
Step 5: Create a Data Table
- Log in to the LakeFormation console.
- In the upper left corner, click
and choose Analytics > LakeFormation to access the LakeFormation console.
- From the drop-down list box on the left, select the LakeFormation instance you have created, for example, lakeformation-test. Choose Metadata > Table in the navigation pane on the left. In the upper right corner of the displayed Table page, select hive and default from the Catalog and Database drop-down list boxes, respectively.
- Click Create. Set parameters by referring to the table below, retain the default values for other parameters, and click Submit.
Table 4 Basic information parameters Parameter
Example Value
Description
Data Table
table_A
Name of the metadata table to be created.
The value can contain up to 256 characters. Only letters, numbers, and underscores (_) are allowed.
Catalog
catalog1
Catalog the table to be created belongs to.
Database
database1
Database the table to be created belongs to.
Table Type
MANAGED_TABLE
Type of the table to be created. The options are:
- MANAGED_TABLE: managed table. If a managed table or partition is deleted, the data and metadata associated with the table or partition will be deleted.
- EXTERNAL_TABLE: external table. Use an external table when a file already exists or is located in a remote location.
- VIRTUAL_VIEW: virtual view. It does not store actual data and does not occupy physical space.
- MATERIALIZED_VIEW: materialized view. It stores actual data and occupies physical space.
Data Storage Location
obs://lakeformation-test/catalog1/database1/table1
OBS file directory the table is mapped to.
Click, select the OBS location where the table is stored, and click OK.
- This parameter is optional. If it is not set, the table storage path is Upper-layer database storage path/Table name.
- The location you specify must start with obs:// and must include a storage object. For example, select obs://lakeformation-test/catalog1/database1/table1. If there is no appropriate parallel file system available, click go to OBS to create one and follow Step 2: Create an OBS Path for Storing Metadata to create it.
- The path must differ from the storage paths of its associated catalog and database.
- To prevent data conflicts, the path cannot be the metadata storage path that is being used by other LakeFormation instances.
- If Data Table Storage Locations is set for the database the table belongs to, set this parameter to a subpath of Select Location or Data Table Storage Locations of the database.
Compress Data
Selected
Whether to compress the data table.
Compressing tables allows data within them to be stored in a compressed format, enhancing performance and saving storage space.
Data Source Format
Parquet
Data source format of the table to be created.
Separator
-
This parameter is available and mandatory when Data Source Format is set to Csv.
Description
xxx
Description of the table to be created.
The value can contain 0 to 4,000 bytes.
- After the table is created, you can check its information on the Table page.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot