Updated on 2024-02-02 GMT+08:00

How to Use LakeFormation

This LakeFormation tutorial describes how to create a LakeFormation instance and interconnect it with MRS clusters to implement unified data lake metadata and permission management.

General Use Procedure

The following figure shows the procedure of interconnecting MRS with LakeFormation.

Figure 1 Procedure of using LakeFormation

Restrictions and Constraints

The restrictions and constraints of interconnecting LakeFormation with MRS clusters are as follows.

  • Before interconnecting LakeFormation with MRS clusters, pay attention to the following restrictions:
    • MRS clusters and LakeFormation instances must belong to the same cloud account and region.
    • The VPC where the access client created by LakeFormation resides must be in the same VPC as MRS clusters.
    • MRS clusters can only interconnect with Hive Catalog of LakeFormation instances.
    • For existing MRS clusters, you need to migrate the metadata database and permission policies to the LakeFormation instance, and then configure the interconnection.
  • After MRS is interconnected with LakeFormation, MRS components are subject to the following constraints:
    • Hive does not support temporary tables.
    • Hive does not support cross-cluster column encryption.
    • Hive WebHCat cannot interconnect with LakeFormation.
    • If the table directory is not empty when Hive creates an internal table, the table cannot be created.
    • Before creating a Hudi table, you need to add the path authorization of the Hudi table directory on LakeFormation to grant OBS read and write permissions.
    • Fields in a Hudi table cannot be edited on the LakeFormation console. You can only add, delete, or modify table fields on the Hudi client.
    • When Flink reads and writes Hive tables, only hive_sync.mode=jdbc can be used to synchronize Hive tables. HMS is not supported.
    • When Spark tries to access the default library on the client, it needs permission for the OBS path. If Spark uses a user account that lacks this permission, the client shows a message saying so. However, this does not affect the database creation, which still succeeds.
  • After MRS is interconnected with LakeFormation, the permission policy restrictions are as follows:
    • In LakeFormation authorization, only LakeFormation roles can be used as authorization entities. Users or user groups cannot be used as authorization entities.
    • The PolicySync process does not modify the default policy of the RangerAdmin Hive module in the cluster. The default policy still takes effect.
    • After the PolicySync process is started, it compares the permissions with those of LakeFormation instances and deletes the non-default policies that do not exist in LakeFormation. You are advised to migrate the permission policies to LakeFormation instances first.
    • For the Hive module on the RangerAdmin web UI, do not add or delete non-default policies. Grant permissions on the data permission page of LakeFormation instances.
    • After the interconnection between the MRS cluster and LakeFormation is canceled, the non-default policies of RangerAdmin will not be cleared. You need to manually clear them.
    • Hive does not support SQL statements for granting permissions. You need to grant permissions on the Data Permissions page.