Help Center/ DataArts Lake Formation/ User Guide/ Migrating Metadata and Permissions to LakeFormation/ Migrating Metadata to LakeFormation Using Metadata Migration
Updated on 2025-07-31 GMT+08:00

Migrating Metadata to LakeFormation Using Metadata Migration

Scenario

Migrate external metadata to LakeFormation and store the data in OBS for unified management.

To prevent path conflicts during Hive metadata migration, you are advised to set the Hive catalog path to the default database path.

Prerequisites

  • A catalog for storing migration metadata has been created for the current instance.
  • The target user has the permission to perform operations on OBS and the catalog for storing migration metadata.
  • You have created an OBS parallel file system for storing migrated data.
  • The name of the table owner can contain 1 to 49 characters, including only letters, digits, and underscores (_). The value cannot contain other characters such as hyphens (-).
  • If metadata in multiple MRS clusters needs to be migrated to the same LakeFormation instance, the database names of the MRS clusters must be unique.
  • If multiple migrations are necessary, the table column updates must adhere to compatibility requirements, ensuring both column order and column type consistency.

Procedure

  1. Log in to the LakeFormation console.
  2. Select the LakeFormation instance to be operated from the drop-down list on the left and choose Jobs > Job Authorization in the navigation pane.

    Click Authorize to grant the job management permissions of LakeFormation to the current user. If authorization has been completed, skip this step.

    To cancel the permission, click Cancel Authorization.

    After the authorization is approved, LakeFormation automatically creates an agency named lakeformation_job_trust. Do not delete the agency during job running.

  3. In the navigation pane, choose Jobs > Metadata Migration.
  4. Click Create Migration Job, set related parameters, and click Submit.

    Table 1 Creating a metadata migration job

    Parameter

    Description

    Job Name

    Name of the metadata migration job.

    Description

    Description of the created migration job.

    Data Source

    Type of the data to be migrated. The options are as follows: The following types are supported:

    • DLF
    • MRS RDS for MySQL
    • OpenSource HiveMetastore for MySQL
    • MRS RDS FOR PostgreSQL
    • MRS LOCAL GaussDB

    JDBC URL

    JDBC URL of the metadata to be migrated. Set this parameter when Data Source Type is not set to DLF.

    Example:

    • JDBC URL of the MySQL data source type: jdbc:mysql://IP address:Port number/Database name? useSSL=false&permitMysqlScheme
    • JDBC URL of the PostgreSQL data source type: jdbc:postgresql://IP address:Port number/Database name?socketTimeout=600

      socketTimeout indicates the socket timeout interval for the connection between the migration client and the database.

    • When configuring the network, the URL will contain the IP address associated with the EIP linked to the data source.

    Username/Password

    Username and password used to access the data source. Username and Password are not displayed when Data Source is set to DLF.

    If the user has a password, Password is mandatory. Otherwise, leave this parameter blank.

    Access Point

    Access point of the metadata service to be migrated.

    This parameter is displayed when Data Source is set to DLF.

    Access Key/Secret Key

    Contact the DLF O&M personnel to obtain the AK/SK information. This parameter is displayed when Data Source is set to DLF.

    Source Catalog

    Name of the catalog to which the metadata to be migrated belongs.

    Target Catalog

    Name of the catalog to which metadata is migrated in LakeFormation.

    Conflict Resolution

    Policy for resolving conflicts during migration.

    Currently, only Create and Update Metadata is supported.

    Log Path

    Storage location of logs generated during migration. Click to select a path.

    The path must exist in OBS. If the path is customized, the migration job will fail.

    Force Table Creation

    Selecting this option will bypass OBS path restrictions when creating an internal table.

    Metadata Filtering Policy

    Metadata filtering policy during migration. The options are:

    • Metadata type
    • Custom rule

    Default Owner

    Default owner of metadata after migration. This parameter is displayed when Data Source is set to DLF.

    • If the configured default owner does not have the corresponding metadata operation permissions, the migrated metadata cannot be added, deleted, modified, or queried. In this case, you can grant permissions to the owner or migrate permissions.
    • If all metadata can be used properly before the migration, you do not need to set this parameter.

    Filtering Policy Storage Location

    Storage location of the custom metadata filtering policy file in the OBS parallel file system.

    Set this parameter when Metadata Filtering Policy is set to Custom rule.

    Filtering Policy File Name

    Name of the user-defined metadata filtering policy file.

    Set this parameter when Metadata Filtering Policy is set to Custom rule.

    Metadata Objects to Migrate

    Select the metadata objects to be migrated. The available options are: Set this parameter when Metadata Filtering Policy is set to Metadata type.

    • All: Databases, functions, data tables, and partitions are migrated. Select All to migrate all metadata for the first migration job.
    • Database: Databases are migrated.
    • Function: Functions are migrated. Ensure that the function class name exists if you select Function.
    • Table: Tables are migrated.
    • Partition: Partitions are migrated.

    Ensure that the upper-level directory of the selected metadata exists if All is not selected. For example, if only Table is selected, make sure that the target catalog includes the database containing the table (for example, DB_1). Otherwise, no tables will be successfully migrated.

    Add Location Rule

    • If the prefix of the metadata storage path is not obs://, click Add Location Rule to replace the prefix with obs:// and ensure that the corresponding OBS storage path exists.

      For example, if the current metadata storage path is file:/a/b, set Original Path to file:/ and New Path to obs://. Ensure that the obs://a/b path exists in the OBS parallel file system, the new metadata storage path is obs://a/b.

    • You can create multiple rules at the same time. If a rule conflict occurs, the rule on the top of the page prevails.

    Execution Policy

    Select the execution policy of the current migration job.

    • Manual: The migration job is manually triggered.

      If you select this mode, you need to click Run in the Operation column to run the migration job after the job is created.

    • Scheduled: The migration job is automatically executed per schedule.

      After selecting this mode, you can select the scheduled execution period (monthly, weekly, daily, or hourly) and set related parameters as required.

    Network Connection

    Select a network connection scheme.

    Select EIP.

    If EIP is selected, you need to also select Security Group ID, which corresponds to the security group ID of the VPC associated with the data source.

    Event notification policy

    (Currently, this function is in the OBT phase.)

    (Optional) Once this option is configured, a notification (via SMS or email) will be sent when a specific event (such as job success or failure) occurs.

    • Event Notification: If this function is enabled, event notifications will be activated.
    • Event Notification Topic: Select the topic to be notified. You can configure the topic using Simple Message Notification (SMN) on the management console.
    • Event: Specifies the status of the topic to be notified. The value can be either Job succeeded or Job failed.

  5. Click Start in the Operation column to run the migration job. If the execution policy is set to Scheduled, you do not need to manually execute the job.

    • Before running a migration job, you need to authorize the job by referring to 2.
    • After the migration job starts, if new metadata is added to the source database, the new metadata will not be migrated. You need to run the migration job again to migrate the new metadata. You can also use the metadata discovery function to migrate new metadata. For details, see Migrating Metadata to LakeFormation Using Metadata Discovery.
    • If the job fails to be executed, you can click Start in the Operation column to retry after the fault is rectified.

    You can click Metadata on the navigation pane and click the name of target metadata object to view the metadata object after the migration. For example, choose Metadata > Database to view the migrated database.

    Click Edit or Delete in the Operation column to modify or delete a job.

  6. Click View Log in the Operation column to view the logs generated during job running. You can click Click here to view complete log to view the complete log.

    • View Job instead of View Log may be displayed on the page. In this case, perform the following operations to view logs:
      1. Click View Log in the Operation column to view the job execution status.
      2. In the displayed dialog box, click Click here to view complete log to view the logs generated during job running.
    • The following table lists some error messages in logs and their causes.
      Table 2 Common errors in logs

      Error Message

      Cause

      field 'storageDescriptor.location' must match '^(obs|har)://.+/.+$'

      Incorrect location rule is configured. (The metadata storage path should start with obs://.)

      Invalid input parameter

      The input parameter of the metadata is invalid or LakeFormation does not support such metadata.

      Incorrect type of column xxx.

      The column type is invalid or LakeFormation is incompatible with the column type.

      No permission to perform this operation on resources.

      The default owner is incorrectly configured or the owner does not have the metadata operation permission.

      Error creating transactional connection factory

      The LakeFormation server is disconnected from the data source. The solution is as follows:

      1. Check whether the username, password, AK, and SK of the data source are correct.
      2. Check whether the database entered in JDBC URL is correct.
      3. Check whether the IP address entered in JDBC URL is correct.

        If the data source type is MRS local metadata, an active/standby DBServer switchover may occur. In this case, you need to bind the EIP to the active node again.

      4. Check whether the security group of the database connection port is enabled.
        • For jobs using an EIP connection mode, 0.0.0.0/0 needs to be allowed in the data source's security group rules before execution.
        • When opting for the VPC peer connection mode, the data source's security group must permit access from the VPC peering connection's peer IP address.

      The entered VPC network segment conflicts with the LakeFormation network segment.

      When the VPC peer connection mode is chosen, a conflict occurs between the VPC network segment of the data source and that of the LakeFormation server. In this case, you can choose to use EIP for migration.

      The log is not found.

      Check whether the log path exists.

      • If the log path already exists, contact LakeFormation O&M personnel for assistance.
      • If the log path does not exist, modify the log path in the job configuration to ensure that the log path exists in OBS.

      The path should be a sub path of the catalog storage location or database location list

      The path must be a subpath of the catalog storage location or database storage location list.

      Incorrect Partition Value

      The entered partition value is incorrect. Check whether the number and type of the entered partition key list of the table match those of the entered partition value list.

      Database does not exist

      The database does not exist. Verify whether the database is present.

      Location doesn't exist in the OBS Parallel File Systems

      The path does not exist in the OBS parallel file system.

      Folder obs://xxxx/yyyy/ is not empty in the OBS

      During table creation, the OBS directory must be empty. This restriction can be bypassed by selecting the forcible table creation option.