Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
Software Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
On this page
Help Center/ DataArts Lake Formation/ User Guide/ Data Migration Management/ Using the Metadata Discovery Function

Using the Metadata Discovery Function

Updated on 2025-01-10 GMT+08:00

Scenario

If data is stored in OBS parallel file systems but is not associated with metadata in LakeFormation, you can use the metadata discovery function to construct metadata corresponding to the data to support the computing and analysis of SQL engines or user applications.

NOTICE:

The metadata discovery feature is currently in OBT and is free of charge. However, once it is officially launched, fees will be charged based on the resources consumed by metadata discovery tasks.

Currently, metadata discovery supports only Spark on Hudi.

Prerequisites

  • The data to be discovered has been uploaded to the OBS parallel file system. That is, the data has been uploaded from S3 or HDFS to the planned path of the OBS parallel file system in the region where the LakeFormation instance is located.
  • The catalog and database for metadata discovery have been prepared and created.

Procedure

  1. Log in to the LakeFormation console.
  2. In the upper left corner, click and choose Analytics > LakeFormation to access the LakeFormation console.
  3. Select the LakeFormation instance to be operated from the drop-down list on the left and choose Tasks > Metadata Discovery in the navigation pane.
  4. Click Create Discovery Task, set related parameters, and click Submit.

    Table 1 Creating a discovery task

    Parameter

    Description

    Task Name

    Name of the metadata discovery task.

    Description

    Description of the created metadata discovery task.

    Data Storage Location

    Location where the discover result table is stored in the OBS parallel file system.

    Click , select a location, and click OK.

    Discovery File Type

    Type of the discovered file. The options include:

    • Automatic discovery (including Parquet, ORC, JSON, Avro, and CSV)
    • Parquet
    • ORC
    • JSON
    • CSV (If you select this type, you also need to configure parameters such as Delimiter, Escape Character, .Quotation Character, and Use first row as column name.)
    • Avro
    NOTE:
    • If the data storage location contains file name extensions of the same type, it is recommended to choose the matching discovery file type.
    • Should there be a variety of file name extensions present, selecting Automatic discovery is advisable.
    • In the absence of a suffix for the file, opt for the appropriate type. Note that Automatic discovery defaults to identifying Parquet files, and may not recognize files of other formats.

    Log Path

    Storage location of logs generated when a metadata discovery task is executed. Click to select a path.

    The path must exist in OBS. If the path is customized, the discovery task will fail.

    Target Catalog

    Name of the catalog to which the metadata to be discovered belongs.

    Target Database

    Name of the database to which the metadata to be discovered belongs.

    Conflict Resolution

    Method used to resolve the issue of duplicate metadata names during metadata discovery.

    • Create and update metadata
    • Create metadata only

    Default Owner

    Default owner of metadata after a metadata discovery task is executed.

    To avoid authorization failure, ensure that the selected entity's name does not contain hyphens (-).

    File Sampling Rate

    (Optional) File sampling frequency.

    When the sampling rate is 0, all partitions after the current partition table are skipped if an empty file is found. This method reduces the operation time, but reduces the accuracy.

    Rediscovery Method

    Execute the discovery policy for metadata discovery again.

    • Full discovery: When you perform the discovery operation again, all files in the data storage location are discovered.
    • Incremental discovery: When you perform the discovery operation again, the system discovers the files added to the data storage location after the last task (successfully executed) starts.

    Execution Policy

    Select the execution policy of the current migration task.

    • Manual: The migration task is manually triggered.

      If you select this mode, you need to click Run in the Operation column to run the migration task after the task is created.

    • Scheduled: The migration task is automatically executed per schedule.

      After selecting this mode, you can select the scheduled execution period (monthly, weekly, daily, or hourly) and set related parameters as required.

    Entity Type

    (Optional) By default, selecting an entity assigns it read permission on the data storage location.

    • You can select a user group, role, IAM user, or agency as the authorization entity.

      To avoid authorization failure, ensure that the selected entity's name does not contain hyphens (-).

    • If you want to grant the write permission as well, select Write Permission.

    Event Notification Policy

    (Optional) Once this option is configured, a notification (via SMS or email) will be sent when a specific event (such as task success or failure) occurs.

    • Event Notification: If this function is enabled, event notifications will be activated.
    • Event Notification Topic: Select the topic to be notified. You can configure the topic using Simple Message Notification (SMN) on the management console.
    • Event: Specifies the status of the topic to be notified. The value can be either Task succeeded or Task failed.

  5. Click Run in the Operation column to run the migration task.

    • Click Stop to stop a running task.
    • Click View Log to view the logs generated during task running.

      By default, the latest 50 lines of logs are displayed.

      You can click the hyperlink at the bottom of the log to view the complete log. For details about the configuration, see section Downloading an Object.

    • Click Edit or Delete in the Operation column to modify or delete a task.

  6. After the migration task is complete, choose Metadata > Table. In the upper right corner, select the target catalog and database from the Catalog and Database drop-down lists to view the discovered tables.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback