Function Overview_Data Lake Insight

Help Center/ Data Lake Insight/ Function Overview

Function Overview

Data Lake Insight
- Data Lake Insight (DLI) is a serverless data processing and analysis service fully compatible with Apache Spark,Trino, and Apache Flink ecosystems. Enterprises can use standard SQL, Spark, and Flink programs to perform joint computing and analysis of multiple data sources to mine and explore data values.
  
  Regions: regions displayed on the console
  
  DLI Advantages over Self-Built Hadoop
  
  Basic Concepts
  
  Constraints and Limitations on Using DLI
Permissions Management
- DLI has a comprehensive permission control mechanism and supports fine-grained authentication through Identity and Access Management (IAM). You can create policies in IAM to manage DLI permissions. You can use both the DLI's permission control mechanism and the IAM service for permission management.
  
  Regions: regions displayed on the console
  
  Creating an IAM User and Granting Permissions
  
  Creating a Custom Policy
  
  Common Operations Supported by DLI System Policy
Elastic Scaling
- The DLI provides the elastic queue scaling function. Users can adjust queue specifications based on their service usage or usage period to meet service requirements and reduce costs.
  
  Regions: regions displayed on the console
- Elastic Scaling
  - The DLI supports on-demand queue scaling. After creating a pay-per-use queue with specified specifications, you can perform elastic scaling as required. Currently, only on-demand queues can be scaled in or out. After the scale-out or scale-in, pay-per-use is still charged, that is, based on CU hours.
    Description:
    
    Elastic scaling can be performed for a newly created pay-per-use queue only when there were jobs running in this queue.
    
    Regions: regions displayed on the console
    
    Elastic Scaling
- Scheduling CU Changes
  - The DLI provides the scheduled task for scaling queues.After creating a queue, you can perform scheduled scaling tasks as required.
    Set the automatic scaling time for queues based on service requirements. The system periodically triggers queue scaling.
    
    After a pay-per-use queue is scaled out or scaled in, the pay-per-use queue is still billed based on CU hours.Currently, scheduled elastic scaling tasks can be performed only for queues with more than 64 CPUs. That is, the minimum number of queues is 64 CPUs.
    
    Yearly/monthly queues support only scheduled scaling tasks. Capacity expansion is charged based on prepaid and pay-per-use CU-hour, that is, resources other than yearly/monthly are charged on a pay-per-use basis.Currently, scheduled scaling tasks can be performed only for the yearly/monthly queues of more than 64 CUs.
    
    Description:
    
    Periodic scaling can be performed for a newly created queue only when there were jobs running in this queue.
    Regions: regions displayed on the console
    
    Scheduling CU Changes
Dual-AZ
- An availability zone (AZ) contains one or multiple physical data centers. Each AZ has independent cooling, fire extinguishing, moisture-proof, and electricity facilities. Within an AZ, computing, network, storage, and other resources are logically divided into multiple clusters. AZs within a region are interconnected using high-speed optical fibers to support cross-AZ high-availability systems. For more information, see Regions and AZs.
  
  DLI cross-AZ queues provide cross-region DR capabilities for users, improving computing reliability. Users can continuously use DLI services when a single AZ is unavailable.This mode applies to scenarios where users have high requirements on queue reliability.
  DLI cross-AZ queues are used to create the same computing resources in two different AZs. For example, if user 1 requires 1400 CUs computing resources, user 1 can select 1400 CUs computing resources and select Cross-AZ when creating a queue. DLI creates dedicated 1400 CUs computing resources in two different AZs for user 1. If one AZ is unavailable, the other AZ can properly process user 1's computing tasks.
  
  Description:
  Currently, only SQL queues are supported.
  Currently, only yearly/monthly queues and on-demand dedicated queues can be used for cross-AZ active-active. Common on-demand queues and default queues are not supported.
  If cross-AZ is selected during queue purchase, the cross-AZ billing rate is twice that in single-AZ mode.
  
  Regions: regions displayed on the console
DLI SQL Job
- DLI SQL jobs are DLI Spark SQL jobs. You can use SQL statements in the SQL editor to perform operations such as data query.Supports SQL2003 and is compatible with SparkSQL. For details about the syntax, see the Data Lake Exploration Spark SQL Syntax Reference.
  
  Regions: regions displayed on the console
  
  Submitting a Spark SQL Job
  
  Submit Download Request
  
  Exporting Query Results
DLI Spark Job
- Based on the open-source Spark, DLI performs a lot of performance optimization and service-oriented reconstruction, is compatible with the Apache Spark ecosystem and interfaces, and executes batch processing tasks.
  DLI can also use Spark jobs to access DLI metadata. For details, see the Data Lake Insight Development Guide.
  
  Regions: regions displayed on the console
  
  Submitting a Spark Jar Job
DLI Flink Job
- DLI Flink jobs support online Flink SQL analysis and cross-source connection with multiple cloud services to form a rich stream ecosystem.
  Currently, the following Flink job types are available:
  
  Flink Jar job: a customized JAR package job based on the Flink API. It can run on an exclusive queue.
  Regions: regions displayed on the console
  
  Creating a Flink Jar Job
Datasource Connection
- Before using DLI to perform cross-source analysis, you need to establish a datasource connection to establish the network between data sources.
  The DLI enhanced datasource connection uses a peer-to-peer connection at the bottom layer to directly connect the VPC network between the DLI queue and the destination data source. The DLI enhanced datasource connection implements point-to-point data communication, providing more flexible application scenarios and stronger performance than the typical cross-source connection.
  Description:
  The default queue does not support the creation of datasource connections.
  
  Regions: regions displayed on the console
  
  Datasource Connection Creation Methods
  
  Enhanced Datasource Connections
Cross-Source Analysis
- The enhanced cross-source function supports all cross-source services implemented by DLI and allows users to access self-built data sources through UDF, Spark, and Flink jobs.The enhanced cross-source scenario supports only yearly/monthly queues and on-demand dedicated queues.
  Currently, DLI supports cross-source access to the following data sources: CloudTable HBase, CloudTable OpenTSDB, CSS, DCS Redis, DDS Mongo, DIS, DMS, DWS, MRS HBase, MRS Kafka, MRS OpenTSDB, and OBS. RDS MySQL, RDS PostGre, and SMN.For details, see Data Sources Supported by DLI.
  Description:
  To access a cross-source table, you need to use the queue for which a cross-source connection has been created.
  Cross-source tables do not support the preview function.
  
  Regions: regions displayed on the console
  
  Datasource Connection Analysis Process
  
  Cross-Source Analysis Development Methods
  
  Creating an Enhanced Datasource Connection to RDS
Custom Image
- DLI supports clusters deployed in containers. In a container cluster, components related to Spark jobs and Flink jobs run in containers. You can download custom images provided by DLI to change the container running environments of Spark jobs and Flink jobs. For example, you can add a Python package or C library related to machine learning to a custom image to facilitate function extension.
  
  Usage Restrictions
  
  The DLI container queue must be used and it must be a dedicated queue.
  
  The basic image provided by DLI must be used.
  
  You cannot modify the DLI components and directories in the basic image.
  
  Only JAR jobs (Spark and Flink Jar jobs) are supported.
  Regions: regions displayed on the console
  
  Creat a Custom Image

Feedback

Was this page helpful?

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.

Chatbot

Function Overview

Data Lake Insight

Permissions Management

Elastic Scaling

Elastic Scaling

Scheduling CU Changes

Dual-AZ

DLI SQL Job

DLI Spark Job

DLI Flink Job

Datasource Connection

Cross-Source Analysis

Custom Image

Usage Restrictions

Feedback

Was this page helpful?