Updated on 2024-07-22 GMT+08:00

Migrating Metadata

Scenario

Migrate external metadata to LakeFormation and store the data in OBS for unified management.

Prerequisites

  • A catalog for storing migration metadata has been created for the current instance.
  • The target user has the permission to perform operations on OBS and the catalog for storing migration metadata.
  • You have created an OBS parallel file system for storing migrated data.
  • The name of the table owner can contain 1 to 49 characters, including only letters, digits, and underscores (_). The value cannot contain other characters such as hyphens (-).
  • If metadata in multiple MRS clusters needs to be migrated to the same LakeFormation instance, the database names of the MRS clusters must be unique.
  • If multiple migrations are necessary, the table column updates must adhere to compatibility requirements, ensuring both column order and column type consistency.

Procedure

  1. Log in to the LakeFormation console.
  2. In the upper left corner, click and choose Analytics > LakeFormation to access the LakeFormation console.
  3. Select the LakeFormation instance to be operated from the drop-down list on the left and choose Tasks > Metadata Migration in the navigation pane.
  4. Click Create Migration Task, set related parameters, and click Submit.

    Table 1 Creating a metadata migration task

    Parameter

    Description

    Task Name

    Name of the metadata migration task.

    Description

    Description of the created migration task.

    Data Source

    Type of the data to be migrated. The options are as follows:

    • DLF
    • MRS RDS for MySQL
    • OpenSource HiveMetastore for MySQL
    • MRS RDS FOR PostgreSQL
    • MRS LOCAL GaussDB

    JDBC URL

    JDBC URL of the metadata to be migrated. Set this parameter when Data Source Type is not set to DLF.

    NOTE:

    Some examples are as follows:

    • JDBC URL of the MySQL data source type: jdbc:mysql://IP address:Port number/Database name? useSSL=false&permitMysqlScheme
    • JDBC URL of the PostgreSQL data source type: jdbc:postgresql://IP address:Port number/Database name?socketTimeout=600

      socketTimeout indicates the socket timeout interval for the connection between the migration client and the database.

    • When configuring the network, the URL will contain the IP address associated with the EIP linked to the data source.

    In addition, you need to set the following parameters:

    • Username: username for accessing the data source.
    • Password: password for accessing the data source.

      If the user has a password, this parameter is mandatory. Otherwise, leave this parameter blank.

    Access Point

    Access point of the metadata service to be migrated.

    This parameter is displayed when Data Source is set to DLF. In addition, you need to set the following parameters:

    • Access Key: Obtain the AK from DLF O&M personnel.
    • Secret Key: Obtain the SK from DLF O&M personnel.

    Source Catalog

    Name of the catalog to which the metadata to be migrated belongs.

    Target Catalog

    Name of the catalog to which metadata is migrated in LakeFormation.

    Conflict Resolution

    Policy for resolving conflicts during migration.

    Currently, only Update old metadata is supported.

    Default Owner

    Default owner of metadata after migration. This parameter is displayed when Data Source is set to DLF.

    • If the configured default owner does not have the corresponding metadata operation permission, the migrated metadata cannot be added, deleted, modified, or queried. In this case, you can grant permissions to the owner or migrate permissions.
    • If all metadata can be used properly before the migration, you do not need to set this parameter.

    Log Path

    Storage location of logs generated during migration. Click to select a path.

    The path must exist in OBS. If the path is customized, the migration task will fail.

    Force Table Creation

    Selecting this option will bypass OBS path restrictions when creating an internal table.

    Metadata Objects to Migrate

    Select the metadata objects to be migrated. The available options are:

    • All: databases, functions, data tables, and partitions.
    • Database: databases.
    • Function: functions.
    • Table: tables.
    • Partition: partitions.
      NOTE:
      • Select All to migrate all metadata for the first migration task.
      • Ensure that the upper-level directory of the selected metadata exists if All is not selected. For example, you need to ensure that the target catalog contains the database (for example, DB_1) where the tables are located if you plan to set this parameter to Table. Otherwise, the table migration will fail.
      • Ensure that the function class name exists if you plan to set this parameter to Function to guarantee a successful function migration task.

    Add Location Rule

    • If the prefix of the metadata storage path is not obs://, click Add Location Rule to replace the prefix with obs:// and ensure that the corresponding OBS storage path exists.

      For example, if the current metadata storage path is file:/a/b, set Original Path to file:/ and New Path to obs://. Ensure that the obs://a/b path exists in the OBS parallel file system, the new metadata storage path is obs://a/b.

    • You can create multiple rules at the same time. If a rule conflict occurs, the rule on the top of the page prevails.

    Network Connection

    Select a network connection scheme.

    You are advised to select EIP.

    In EIP is selected, you need to also select Security Group ID, which corresponds to the security group ID of the VPC associated with the data source.

    Event notification policy

    (Currently, this function is in the OBT phase.)

    (Optional) Once this option is configured, a notification (via SMS or email) will be sent when a specific event (such as task success or failure) occurs.

    • Event Notification: If this function is enabled, event notifications will be activated.
    • Event Notification Topic: Select the topic to be notified. You can configure the topic using Simple Message Notification (SMN) on the management console.
    • Event: Specifies the status of the topic to be notified. The value can be either Task succeeded or Task failed.

  5. Click Start in the Operation column to run the migration task.

    • Before running a migration task, you need to authorize the task by referring to Granting the Job Management Permission.
    • After the migration task starts, if new metadata is added to the source database, the new metadata will not be migrated. You need to run the migration task again to migrate the new metadata. You can also use the metadata discovery function to migrate new metadata. For details, see Using the Metadata Discovery Function.
    • If the task fails to be executed, you can click Start in the Operation column to retry after rectifying the fault.

    You can click Metadata on the navigation pane and click the name of target metadata object to view the metadata object after the migration. For example, choose Metadata > Database to view the migrated database.

    Click Edit or Delete in the Operation column to modify or delete a task.

  6. Click View Log in the Operation column to view the logs generated during task running.

    By default, the latest 50 lines of logs are displayed.

    You can click the hyperlink at the bottom of the log to view the complete log. For details about the configuration, see section "Downloading an Object" in Object Storage Service 3.0 (OBS) 3.24.3h&s User Guide (for Huawei Cloud Stack 8.3.1) in Object Storage Service 3.0 (OBS) 3.24.3h&s Usage Guide (for Huawei Cloud Stack 8.3.1).

    The following table lists some error messages in logs and their causes.

    Error Message

    Cause

    field 'storageDescriptor.location' must match '^(obs|har)://.+/.+$'

    Incorrect location rule is configured. (The metadata storage path should start with obs://.)

    Invalid input parameter

    The input parameter of the metadata is invalid or LakeFormation does not support such metadata.

    Incorrect type of column xxx.

    The column type is invalid or LakeFormation is incompatible with the column type.

    No permission to perform this operation on resources.

    The default owner is incorrectly configured or the owner does not have the metadata operation permission.

    Error creating transactional connection factory

    The LakeFormation server is disconnected from the data source. The solution is as follows:

    1. Check whether the username, password, AK, and SK of the data source are correct.
    2. Check whether the database entered in JDBC URL is correct.
    3. Check whether the IP address entered in JDBC URL is correct.

      If the data source type is MRS local metadata, an active/standby DBServer switchover may occur. In this case, you need to bind the EIP to the active node again.

    4. Check whether the security group of the database connection port is enabled.
      • For tasks using an EIP connection mode, 0.0.0.0/0 needs to be allowed in the data source's security group rules before execution.
      • When opting for the VPC peer connection mode, the data source's security group must permit access from the VPC peering connection's peer IP address.

    The entered VPC network segment conflicts with the LakeFormation network segment.

    When the VPC peer connection mode is chosen, a conflict occurs between the VPC network segment of the data source and that of the LakeFormation server. In this case, you can choose to use EIP for migration.

    The log is not found.

    Check whether the log path exists.

    • If the log path already exists, contact LakeFormation O&M personnel for assistance.
    • If the log path does not exist, modify the log path in the task configuration to ensure that the log path exists in OBS.

    The path should be a sub path of the catalog storage location or database location list

    The path must be a subpath of the catalog storage location or database storage location list.

    Incorrect Partition Value

    The entered partition value is incorrect. Check whether the number and type of the entered partition key list of the table match those of the entered partition value list.

    Database does not exist

    The database does not exist. Verify whether the database is present.

    Location doesn't exist in the OBS Parallel File Systems

    The path does not exist in the OBS parallel file system.