Using the Metadata Discovery Function

Scenario

If data is stored in OBS parallel file systems but is not associated with metadata in LakeFormation, you can use the metadata discovery function to construct metadata corresponding to the data to support the computing and analysis of SQL engines or user applications.

The metadata discovery feature is currently in OBT and is free of charge. However, once it is officially launched, fees will be charged based on the resources consumed by metadata discovery tasks.

Prerequisites

Authorization has been enabled by referring to Granting the Job Management Permission.

The data to be discovered has been uploaded to the OBS parallel file system. That is, the data has been uploaded from S3 or HDFS to the planned path of the OBS parallel file system in the region where the LakeFormation instance is located.
The catalog and database for metadata discovery have been prepared and created.

Procedure

Log in to the LakeFormation console.
In the upper left corner, click and choose Analytics > LakeFormation to access the LakeFormation console.
Select the LakeFormation instance to be operated from the drop-down list on the left and choose Tasks > Metadata Discovery in the navigation pane.

Click Create Discovery Task, set related parameters, and click Submit.

**Table 1** Creating a discovery task
Parameter	Description
Task Name	Name of the metadata discovery task.
Description	Description of the created metadata discovery task.
Data Storage Location	Location where the discovered metadata is stored in the OBS parallel file system. Click , select a location, and click OK.
Discovery File Type	Type of the discovered file. The options include: Automatic discovery (including Parquet, ORC, JSON, Avro, and CSV) Parquet ORC JSON CSV (If you select this type, you also need to configure parameters such as Delimiter, Escape Character, .Quotation Character, and Use first row as column name.) Avro NOTE: If the data storage location contains file name extensions of the same type, it is recommended to choose the matching discovery file type. Should there be a variety of file name extensions present, selecting Automatic discovery is advisable. In the absence of a suffix for the file, opt for the appropriate type. Note that Automatic discovery defaults to identifying Parquet files, and may not recognize files of other formats.
Log Path	Storage location of logs generated when a metadata discovery task is executed. Click to select a path. The path must exist in OBS. If the path is customized, the discovery task will fail.
Target Catalog	Name of the catalog to which the metadata to be discovered belongs.
Target Database	Name of the database to which the metadata to be discovered belongs.
Conflict Resolution	Method used to resolve the issue of duplicate metadata names during metadata discovery. Create and update metadata Create metadata only
Default Owner	Default owner of metadata after a metadata discovery task is executed. To avoid authorization failure, ensure that the selected entity's name does not contain hyphens (-).
File Sampling Rate	(Optional) File sampling frequency. When the sampling rate is 0, all partitions after the current partition table are skipped if an empty file is found. This method reduces the operation time, but reduces the accuracy.
Rediscovery Method	Execute the discovery policy for metadata discovery again. Full discovery: When you perform the discovery operation again, all files in the data storage location are discovered. Incremental discovery: When you perform the discovery operation again, the system discovers the files added to the data storage location after the last task (successfully executed) starts.
Entity Type	(Optional) By default, selecting an entity assigns it read permission on the data storage location. You can choose user groups, roles, or users to be the authorization entity. To avoid authorization failure, ensure that the selected entity's name does not contain hyphens (-). If you want to grant the write permission as well, select Write Permission.

Click Run in the Operation column to run the migration task.
- Click Stop to stop a running task.
- Click View Log to view the logs generated during task running.
  By default, the latest 50 lines of logs are displayed.
  
  You can click the hyperlink at the bottom of the log to view the complete log. For details about the configuration, see section "Downloading an Object" in Object Storage Service 3.0 (OBS) 3.24.3h&s User Guide (for Huawei Cloud Stack 8.5.0) in Object Storage Service 3.0 (OBS) 3.24.3h&s Usage Guide (for Huawei Cloud Stack 8.5.0).
- Click Edit or Delete in the Operation column to modify or delete a task.
After the migration task is complete, choose Metadata > Table. In the upper right corner, select the target catalog and database from the Catalog and Database drop-down lists to view the discovered tables.

Parent topic: Data Migration Management

Previous topic: Migrating Permissions

Next topic: Managing Clients

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.

Chatbot