Migrating Metadata to LakeFormation Using Metadata Migration
Scenario
Migrate external metadata to LakeFormation and store the data in OBS for unified management.
To prevent path conflicts during Hive metadata migration, you are advised to set the Hive catalog path to the default database path.
Prerequisites
- A catalog for storing migration metadata has been created for the current instance.
- The target user has the permission to perform operations on OBS and the catalog for storing migration metadata.
- You have created an OBS parallel file system for storing migrated data.
- The name of the table owner can contain 1 to 49 characters, including only letters, digits, and underscores (_). The value cannot contain other characters such as hyphens (-).
- If metadata in multiple MRS clusters needs to be migrated to the same LakeFormation instance, the database names of the MRS clusters must be unique.
- If multiple migrations are necessary, the table column updates must adhere to compatibility requirements, ensuring both column order and column type consistency.
Procedure
- Log in to the LakeFormation console.
- Select the LakeFormation instance to be operated from the drop-down list on the left and choose Jobs > Job Authorization in the navigation pane.
Click Authorize to grant the job management permissions of LakeFormation to the current user. If authorization has been completed, skip this step.
To cancel the permission, click Cancel Authorization.
After the authorization is approved, LakeFormation automatically creates an agency named lakeformation_job_trust. Do not delete the agency during job running.
- In the navigation pane, choose Jobs > Metadata Migration.
- Click Create Migration Job, set related parameters, and click Submit.
Table 1 Creating a metadata migration job Parameter
Description
Job Name
Name of the metadata migration job.
Description
Description of the created migration job.
Data Source
Type of the data to be migrated. The options are as follows: The following types are supported:
- DLF
- MRS RDS for MySQL
- OpenSource HiveMetastore for MySQL
- MRS RDS FOR PostgreSQL
- MRS LOCAL GaussDB
JDBC URL
JDBC URL of the metadata to be migrated. Set this parameter when Data Source Type is not set to DLF.
Example:
- JDBC URL of the MySQL data source type: jdbc:mysql://IP address:Port number/Database name? useSSL=false&permitMysqlScheme
- JDBC URL of the PostgreSQL data source type: jdbc:postgresql://IP address:Port number/Database name?socketTimeout=600
socketTimeout indicates the socket timeout interval for the connection between the migration client and the database.
- When configuring the network, the URL will contain the IP address associated with the EIP linked to the data source.
Username/Password
Username and password used to access the data source. Username and Password are not displayed when Data Source is set to DLF.
If the user has a password, Password is mandatory. Otherwise, leave this parameter blank.
Access Point
Access point of the metadata service to be migrated.
This parameter is displayed when Data Source is set to DLF.
Access Key/Secret Key
Contact the DLF O&M personnel to obtain the AK/SK information. This parameter is displayed when Data Source is set to DLF.
Source Catalog
Name of the catalog to which the metadata to be migrated belongs.
Target Catalog
Name of the catalog to which metadata is migrated in LakeFormation.
Conflict Resolution
Policy for resolving conflicts during migration.
Currently, only Create and Update Metadata is supported.
Log Path
Storage location of logs generated during migration. Click
to select a path.
The path must exist in OBS. If the path is customized, the migration job will fail.
Force Table Creation
Selecting this option will bypass OBS path restrictions when creating an internal table.
Metadata Filtering Policy
Metadata filtering policy during migration. The options are:
- Metadata type
- Custom rule
Default Owner
Default owner of metadata after migration. This parameter is displayed when Data Source is set to DLF.
- If the configured default owner does not have the corresponding metadata operation permissions, the migrated metadata cannot be added, deleted, modified, or queried. In this case, you can grant permissions to the owner or migrate permissions.
- If all metadata can be used properly before the migration, you do not need to set this parameter.
Filtering Policy Storage Location
Storage location of the custom metadata filtering policy file in the OBS parallel file system.
Set this parameter when Metadata Filtering Policy is set to Custom rule.
Filtering Policy File Name
Name of the user-defined metadata filtering policy file.
Set this parameter when Metadata Filtering Policy is set to Custom rule.
Metadata Objects to Migrate
Select the metadata objects to be migrated. The available options are: Set this parameter when Metadata Filtering Policy is set to Metadata type.
- All: Databases, functions, data tables, and partitions are migrated. Select All to migrate all metadata for the first migration job.
- Database: Databases are migrated.
- Function: Functions are migrated. Ensure that the function class name exists if you select Function.
- Table: Tables are migrated.
- Partition: Partitions are migrated.
Ensure that the upper-level directory of the selected metadata exists if All is not selected. For example, if only Table is selected, make sure that the target catalog includes the database containing the table (for example, DB_1). Otherwise, no tables will be successfully migrated.
Add Location Rule
- If the prefix of the metadata storage path is not obs://, click Add Location Rule to replace the prefix with obs:// and ensure that the corresponding OBS storage path exists.
For example, if the current metadata storage path is file:/a/b, set Original Path to file:/ and New Path to obs://. Ensure that the obs://a/b path exists in the OBS parallel file system, the new metadata storage path is obs://a/b.
- You can create multiple rules at the same time. If a rule conflict occurs, the rule on the top of the page prevails.
Execution Policy
Select the execution policy of the current migration job.
- Manual: The migration job is manually triggered.
If you select this mode, you need to click Run in the Operation column to run the migration job after the job is created.
- Scheduled: The migration job is automatically executed per schedule.
After selecting this mode, you can select the scheduled execution period (monthly, weekly, daily, or hourly) and set related parameters as required.
Network Connection
Select a network connection scheme.
Select EIP.
If EIP is selected, you need to also select Security Group ID, which corresponds to the security group ID of the VPC associated with the data source.
Event notification policy
(Currently, this function is in the OBT phase.)
(Optional) Once this option is configured, a notification (via SMS or email) will be sent when a specific event (such as job success or failure) occurs.
- Event Notification: If this function is enabled, event notifications will be activated.
- Event Notification Topic: Select the topic to be notified. You can configure the topic using Simple Message Notification (SMN) on the management console.
- Event: Specifies the status of the topic to be notified. The value can be either Job succeeded or Job failed.
- Click Start in the Operation column to run the migration job. If the execution policy is set to Scheduled, you do not need to manually execute the job.
- Before running a migration job, you need to authorize the job by referring to 2.
- After the migration job starts, if new metadata is added to the source database, the new metadata will not be migrated. You need to run the migration job again to migrate the new metadata. You can also use the metadata discovery function to migrate new metadata. For details, see Migrating Metadata to LakeFormation Using Metadata Discovery.
- If the job fails to be executed, you can click Start in the Operation column to retry after the fault is rectified.
You can click Metadata on the navigation pane and click the name of target metadata object to view the metadata object after the migration. For example, choose Metadata > Database to view the migrated database.
Click Edit or Delete in the Operation column to modify or delete a job.
- Click View Log in the Operation column to view the logs generated during job running. You can click Click here to view complete log to view the complete log.
- View Job instead of View Log may be displayed on the page. In this case, perform the following operations to view logs:
- Click View Log in the Operation column to view the job execution status.
- In the displayed dialog box, click Click here to view complete log to view the logs generated during job running.
- The following table lists some error messages in logs and their causes.
- View Job instead of View Log may be displayed on the page. In this case, perform the following operations to view logs:
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot