LakeFormation
Huawei Cloud LakeFormation is an enterprise-class, one-stop data lake construction service. It provides unified metadata management, fine-grained permission control, and compatibility with open-source ecosystems through decoupled storage and compute. These capabilities help enterprises build and operate data lakes efficiently.
DataArts Migration can efficiently migrate data from and to LakeFormation.
How It Works
LakeFormation data is integrated through writing of native OBS files. Hive PARQUET/ORC data, partitioned tables, and non-partitioned tables can be processed. The integration delivers excellent write performance.
Preparation and Constraints
- Network requirements
The LakeFormation data source can communicate with CDM. This ensures smooth data transmission. For details, see Enabling Network Connectivity.
- Required permissions
- LakeFormation metadata read and write permissions: DataArts Migration reads and writes LakeFormation data. The LakeFormation CommonOperations or LakeFormation FullAccess system policy must be assigned to DataArts Migration. For details, see LakeFormation Permissions.
- OBS write permission: DataArts Migration reads files from and writes files to OBS. The OBS OperateAccess or OBS Administrator system policy can be assigned to DataArts Migration.
- Table format restrictions
Currently, only Hive tables of LakeFormation can be written to DataArts Migration.
Supported Data Types
The following table lists the types of LakeFormation data that can be written.
| Data Type | LakeFormation Data Type | Write |
|---|---|---|
| Numeric | TINYINT | √ |
| SMALLINT | √ | |
| INT | √ | |
| BIGINT | √ | |
| FLOAT | √ | |
| DOUBLE | √ | |
| DECIMAL | √ | |
| Boolean | BOOLEAN | √ |
| Character | CHAR | √ |
| VARCHAR | √ | |
| STRING | √ | |
| Date/Time | DATE | √ |
| TIMESTAMP | √ | |
| Binary | BYTEA | √ |
| Complex type | ARRAY | √ |
| MAP | √ | |
| UNIONTYPE | x | |
| STRUCT | x |
Supported File Storage Formats
The following table lists the LakeFormation file storage formats.
| Data Source Storage Format | Write |
|---|---|
| PARQUET | √ |
| ORC | √ |
| AVRO | x |
| JSON | x |
| XML | x |
| CSV | x |
| TEXT | x |
| RC | x |
| SEQUENCE | x |
Supported Migration Scenarios
DataArts Migration supports the following modes for synchronizing on-premises data:
- Single table synchronization
DataArts Migration supports table/file synchronization in data ingestion into a data lake or data migration to the cloud.
- Database and table shard synchronization
DataArts Migration supports synchronization of data from multiple databases and tables in data ingestion into a data lake or data migration to the cloud.
- Entire DB migration
DataArts Migrations supports synchronization of data from an on-premises database in data ingestion into a data lake or data migration to the cloud.
Database and table shard synchronization and entire DB migration are not supported in all regions. The following table lists the supported LakeFormation data migration scenarios.
| Supported Migration Scenario | Single Table Read | Single Table Write | Database/Table Shard Read | Database/Table Shard Write | Entire DB Read | Entire DB Write |
|---|---|---|---|---|---|---|
| Supported | x | √ | x | √ | x | x |
Core Capabilities
- Connection configuration
Configuration Item
Supported
Description
AK/SK authentication
√
AK/SK authentication is used to access LakeFormation.
Agency authentication
√
An IAM agency authorizes roles to access the service.
- Write capabilities
Configuration Item
Supported
Description
Write Mode
LOAD
LOAD OVERWRITE
Two write modes are supported: LOAD and LOAD OVERWRITE.
- LOAD adds data to a destination table and is applicable to writing incremental data.
- LOAD OVERWRITE overwrites the data in the destination table or partition.
Dirty Data Processing
x
Abnormal data can be written to the dirty data bucket to prevent job failures caused by a small amount of abnormal data. This function is not supported currently.
Concurrent Write
√
Concurrent write can fully utilize cluster resources to improve the data write speed.
Table Creation in Editing State
√
A destination table can be created during the configuration of a job that migrates data from a semi-structured or structured data source to LakeFormation.
Creating a Data Source
Create a data source in Management Center. For details, see Configuring Data Connection Parameters.
Creating an Offline Data Migration Job
Create a LakeFormation migration job in DataArts Factory. For details, see Creating an Offline Processing Migration Job.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot