Function Overview_Data Lake Insight-Huawei Cloud

Help Center/ Data Lake Insight/ Function Overview

Function Overview

dli

Data Lake Insight
- Data Lake Insight (DLI) is a serverless data processing and analysis service fully compatible with Apache Spark, Apache Flink, and HetuEngine ecosystems. By utilizing standard SQL, Spark, and Flink programs, businesses can easily perform joint computation analysis across multiple data sources, as well as explore and uncover the value of their data.
  
  Available in all regions
  
  Advantages of Serverless DLI over On-Premises Hadoop
  
  DLI Notes and Constraints
dli

Permission Management
- DLI has a comprehensive access control mechanism built-in, and also supports fine-grained authentication through Identity and Access Management (IAM). This allows for the management of DLI access control by creating policies in IAM. Both access control mechanisms can be used together without conflict.
  
  Available in all regions
  
  Creating a User and Granting DLI Permissions
  
  DLI Custom Policies
  
  Common Operations Supported by DLI System Policy
dli

Elastic Resource Pools and Queues
- Before submitting a job using DLI, you need to prepare the necessary compute resources.
  · Queues in an elastic resource pool: An elastic resource pool offers compute resources (CPU and memory) required for running DLI jobs, which can adapt to the changing demands of services. Queues within an elastic resource pool can be shared to execute jobs. This is achieved by properly setting the queue allocation policy. This improves queue utilization.
  
  · default queue: This queue is typically used by users who are new to DLI.
  
  Available in all regions
- dli
  
  Elastic resource pool
  - An elastic resource pool provides the necessary compute resources (CPU and memory) for DLI job execution. It has powerful computing capabilities, high availability, and flexible resource management, making it ideal for large-scale computing tasks and business scenarios requiring long-term resource planning. Additionally, it can adapt to changing service demands for compute resources.
    
    Available in all regions
    
    Creating an Elastic Resource Pool
- dli
  
  Queues in an elastic resource pool
  - You can create multiple queues within an elastic resource pool. These queues are associated with specific jobs and data processing tasks, and serve as the basic unit for resource allocation and usage within the pool. This means queues are specific compute resources required for executing jobs. Queues within an elastic resource pool can be shared to execute jobs. This is achieved by properly setting the queue allocation policy. This improves queue utilization.
    
    Available in all regions
    
    Creating a Queue Within an Elastic Resource Pool
- dli
  
  default queue
  - The default queue is preset in DLI, which allocates resources as needed. If you are unsure about how much queue capacity you will require or you do not have a space for creating queues, you can use this default queue to run your jobs. The default queue is typically used by users who are new to DLI, but it can cause resource contention and prevent you from getting the resources you need for your jobs.
    
    Available in all regions
    
    Elastic Resource Pools and Queues
dli

DLI Metadata Management
- DLI metadata is the basis for developing SQL and Spark jobs. Before executing a job, you need to define databases and tables based on your business scenario.
  
  Apart from DLI metadata, DLI can also connect to LakeFormation for unified metadata management. It seamlessly integrates various computing engines and big data cloud services, enabling the efficient and convenient construction of data lakes and operation services.
  
  Available in all regions
- dli
  
  DLI metadata
  - DLI metadata is the basis for developing SQL and Spark jobs. Before executing a job, you need to define databases and tables based on your business scenario.
    ·Data catalog: A data catalog is a metadata management object that can contain multiple databases. You can create and manage multiple catalogs in DLI to isolate different metadata.
    ·Database: A database is a repository of data organized, stored, and managed on computer storage devices according to data structures. Databases are typically used to store, retrieve, and manage structured data, consisting of multiple data tables that are interrelated through keys and indexes.
    ·Table: Tables are one of the most important components of a database, consisting of rows and columns. Each row represents a data item, while each column represents a property or feature of the data. Tables are used to organize and store specific types of data, making it possible to query and analyze the data effectively. A database is a framework, and tables are its essential content. A database contains one or more tables.
    
    ·Metadata: Metadata is data used to define the type of data. It primarily describes information about the data itself, including its source, size, format, or other characteristics. In database fields, metadata is used to interpret the content of a data warehouse. When creating a table, metadata is defined by three columns: column name, type, and column description.
    
    Available in all regions
    
    Creating a Database and Tables
- dli
  
  Connecting DLI to LakeFormation to manage metadata
  - You can create multiple queues within an elastic resource pool. These queues are associated with specific jobs and data processing tasks, and serve as the basic unit for resource allocation and usage within the pool. This means queues are specific compute resources required for executing jobs. Queues within an elastic resource pool can be shared to execute jobs. This is achieved by properly setting the queue allocation policy. This improves queue utilization.
    
    Available regions: subject to the descriptions in the user guide.
    
    Available regions: subject to the descriptions in the user guide.
    
    Connecting DLI to LakeFormation
dli

DLI SQL Job
- DLI SQL jobs, also known as DLI Spark SQL jobs, allow you to execute data queries and other operations using SQL statements in the SQL editor. They support SQL:2003 and are compatible with Spark SQL. For detailed syntax descriptions, refer to Data Lake Insight Spark SQL Syntax Reference.
  
  Available in all regions
  
  Using DLI to Submit a SQL Job to Query OBS Data
  
  Exporting SQL Job Results
dli

DLI Spark Job
- The DLI team has extensively optimized and transformed the open-source Spark to provide batch processing capabilities, while remaining compatible with the Apache Spark ecosystem and APIs.
  
  Additionally, DLI supports accessing its metadata using Spark jobs. For more information, refer to Data Lake Insight Developer Guide.
  
  Available in all regions
  
  Using DLI to Submit a Spark Jar Job
dli

DLI Flink Job
- DLI Flink jobs are specifically designed for real-time data stream processing, making them ideal for scenarios that require low latency and quick response. They can also be connected to multiple cloud services, creating a rich ecosystem for streaming data. It is well-suited for real-time monitoring and online analysis.
  
  · Flink OpenSource job: DLI provides standard connectors and various APIs to facilitate quick integration with other data systems.
  
  · Flink Jar job: allows you to submit Flink jobs compiled into JAR files, providing greater flexibility and customization capabilities. It is suitable for complex data processing scenarios that require user-defined functions (UDFs) or specific library integration. The Flink ecosystem can be utilized to implement advanced stream processing logic and status management.
  
  Available in all regions
  
  Using DLI to Submit a Flink OpenSource SQL Job
  
  Using DLI to Submit a Flink Jar Job
dli

Datasource Connection
- Before conducting cross-source analysis using DLI, you need to establish a datasource connection to connect the networks between data sources.
  
  DLI's enhanced datasource connections use VPC peering connections to directly connect DLI queues to VPC networks of destination data sources. This enables data exchange through a point-to-point approach, providing more flexible use cases and stronger performance than basic datasource connections.
  
  Note: Datasource connections cannot be created for the default queue. The VPC Administrator permission is required to use the VPC, subnet, route, and VPC peering connection for DLI datasource connections. You can set these permissions by referring to "Service Authorization".
  
  Available in all regions
  
  Creating an Enhanced Datasource Connection
dli

Custom DLI Agency
- To perform cross-source analysis, DLI requires agency permissions to access other cloud services. This means allowing DLI to act on behalf of users or services in other cloud services, enabling it to read/write data and execute specific operations during job execution. DLI agency ensures secure and efficient access to other cloud services in cross-source analysis scenarios.
  
  Available in all regions
  
  Creating a Custom DLI Agency
dli

Custom Image
- DLI supports clusters deployed in containers. In a container cluster, components related to Spark jobs and Flink jobs run in containers. You can download custom images provided by DLI to change the container running environments of Spark jobs and Flink jobs. For example, you can add a Python package or C library related to machine learning to a custom image to help you extend functions.
  
  Available in all regions
  
  Creating a Custom Image

Feedback

Was this page helpful?

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.

Chatbot

Function Overview

dli Data Lake Insight

dli Permission Management

dli Elastic Resource Pools and Queues

dli Elastic resource pool

dli Queues in an elastic resource pool

dli default queue

dli DLI Metadata Management

dli DLI metadata

dli Connecting DLI to LakeFormation to manage metadata

dli DLI SQL Job

dli DLI Spark Job

dli DLI Flink Job

dli Datasource Connection

dli Custom DLI Agency

dli Custom Image

Feedback

Was this page helpful?

dli

Data Lake Insight

dli

Permission Management

dli

Elastic Resource Pools and Queues

dli

Elastic resource pool

dli

Queues in an elastic resource pool

dli

default queue

dli

DLI Metadata Management

dli

DLI metadata

dli

Connecting DLI to LakeFormation to manage metadata

dli

DLI SQL Job

dli

DLI Spark Job

dli

DLI Flink Job

dli

Datasource Connection

dli

Custom DLI Agency

dli

Custom Image