Function Overview
-
dli
-
Data Lake Insight (DLI) is a serverless data processing and analysis service fully compatible with Apache Spark, Apache Flink, and HetuEngine ecosystems. By utilizing standard SQL, Spark, and Flink programs, businesses can easily perform joint computation analysis across multiple data sources, as well as explore and uncover the value of their data.
Available in all regions
-
-
dli
-
DLI has a comprehensive access control mechanism built-in, and also supports fine-grained authentication through Identity and Access Management (IAM). This allows for the management of DLI access control by creating policies in IAM. Both access control mechanisms can be used together without conflict.
-
-
dli
-
Before submitting a job using DLI, you need to prepare the necessary compute resources.
· Queues in an elastic resource pool: An elastic resource pool offers compute resources (CPU and memory) required for running DLI jobs, which can adapt to the changing demands of services. Queues within an elastic resource pool can be shared to execute jobs. This is achieved by properly setting the queue allocation policy. This improves queue utilization.· default queue: This queue is typically used by users who are new to DLI.
Available in all regions
-
dli
-
An elastic resource pool provides the necessary compute resources (CPU and memory) for DLI job execution. It has powerful computing capabilities, high availability, and flexible resource management, making it ideal for large-scale computing tasks and business scenarios requiring long-term resource planning. Additionally, it can adapt to changing service demands for compute resources.
Available in all regions
-
-
dli
-
You can create multiple queues within an elastic resource pool. These queues are associated with specific jobs and data processing tasks, and serve as the basic unit for resource allocation and usage within the pool. This means queues are specific compute resources required for executing jobs. Queues within an elastic resource pool can be shared to execute jobs. This is achieved by properly setting the queue allocation policy. This improves queue utilization.
Available in all regions
-
-
dli
-
The default queue is preset in DLI, which allocates resources as needed. If you are unsure about how much queue capacity you will require or you do not have a space for creating queues, you can use this default queue to run your jobs. The default queue is typically used by users who are new to DLI, but it can cause resource contention and prevent you from getting the resources you need for your jobs.
Available in all regions
-
-
-
dli
-
DLI metadata is the basis for developing SQL and Spark jobs. Before executing a job, you need to define databases and tables based on your business scenario.
Apart from DLI metadata, DLI can also connect to LakeFormation for unified metadata management. It seamlessly integrates various computing engines and big data cloud services, enabling the efficient and convenient construction of data lakes and operation services.
Available in all regions
-
dli
-
DLI metadata is the basis for developing SQL and Spark jobs. Before executing a job, you need to define databases and tables based on your business scenario.
·Data catalog: A data catalog is a metadata management object that can contain multiple databases. You can create and manage multiple catalogs in DLI to isolate different metadata.
·Database: A database is a repository of data organized, stored, and managed on computer storage devices according to data structures. Databases are typically used to store, retrieve, and manage structured data, consisting of multiple data tables that are interrelated through keys and indexes.
·Table: Tables are one of the most important components of a database, consisting of rows and columns. Each row represents a data item, while each column represents a property or feature of the data. Tables are used to organize and store specific types of data, making it possible to query and analyze the data effectively. A database is a framework, and tables are its essential content. A database contains one or more tables.·Metadata: Metadata is data used to define the type of data. It primarily describes information about the data itself, including its source, size, format, or other characteristics. In database fields, metadata is used to interpret the content of a data warehouse. When creating a table, metadata is defined by three columns: column name, type, and column description.
Available in all regions
-
-
dli
-
You can create multiple queues within an elastic resource pool. These queues are associated with specific jobs and data processing tasks, and serve as the basic unit for resource allocation and usage within the pool. This means queues are specific compute resources required for executing jobs. Queues within an elastic resource pool can be shared to execute jobs. This is achieved by properly setting the queue allocation policy. This improves queue utilization.
Available regions: subject to the descriptions in the user guide.
Available regions: subject to the descriptions in the user guide.
-
-
-
dli
-
DLI SQL jobs, also known as DLI Spark SQL jobs, allow you to execute data queries and other operations using SQL statements in the SQL editor. They support SQL:2003 and are compatible with Spark SQL. For detailed syntax descriptions, refer to Data Lake Insight Spark SQL Syntax Reference.
Available in all regions
-
-
dli
-
The DLI team has extensively optimized and transformed the open-source Spark to provide batch processing capabilities, while remaining compatible with the Apache Spark ecosystem and APIs.
Additionally, DLI supports accessing its metadata using Spark jobs. For more information, refer to Data Lake Insight Developer Guide.
Available in all regions
-
-
dli
-
DLI Flink jobs are specifically designed for real-time data stream processing, making them ideal for scenarios that require low latency and quick response. They can also be connected to multiple cloud services, creating a rich ecosystem for streaming data. It is well-suited for real-time monitoring and online analysis.
· Flink OpenSource job: DLI provides standard connectors and various APIs to facilitate quick integration with other data systems.
· Flink Jar job: allows you to submit Flink jobs compiled into JAR files, providing greater flexibility and customization capabilities. It is suitable for complex data processing scenarios that require user-defined functions (UDFs) or specific library integration. The Flink ecosystem can be utilized to implement advanced stream processing logic and status management.
Available in all regions
-
-
dli
-
Before conducting cross-source analysis using DLI, you need to establish a datasource connection to connect the networks between data sources.
DLI's enhanced datasource connections use VPC peering connections to directly connect DLI queues to VPC networks of destination data sources. This enables data exchange through a point-to-point approach, providing more flexible use cases and stronger performance than basic datasource connections.
Note: Datasource connections cannot be created for the default queue. The VPC Administrator permission is required to use the VPC, subnet, route, and VPC peering connection for DLI datasource connections. You can set these permissions by referring to "Service Authorization".
Available in all regions
-
-
dli
-
To perform cross-source analysis, DLI requires agency permissions to access other cloud services. This means allowing DLI to act on behalf of users or services in other cloud services, enabling it to read/write data and execute specific operations during job execution. DLI agency ensures secure and efficient access to other cloud services in cross-source analysis scenarios.
Available in all regions
-
-
dli
-
DLI supports clusters deployed in containers. In a container cluster, components related to Spark jobs and Flink jobs run in containers. You can download custom images provided by DLI to change the container running environments of Spark jobs and Flink jobs. For example, you can add a Python package or C library related to machine learning to a custom image to help you extend functions.
Available in all regions
-
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot