Bu sayfa henüz yerel dilinizde mevcut değildir. Daha fazla dil seçeneği eklemek için yoğun bir şekilde çalışıyoruz. Desteğiniz için teşekkür ederiz.

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
Help Center/ Data Lake Insight/ Service Overview/ What Is Data Lake Insight

What Is Data Lake Insight

Updated on 2025-01-09 GMT+08:00

DLI Introduction

Data Lake Insight (DLI) is a serverless data processing and analysis service fully compatible with Apache Spark, HetuEngine, and Apache Flink ecosystems. It frees you from managing any servers.

DLI supports standard SQL and is compatible with Spark SQL and Flink SQL. It also supports multiple access modes, and is compatible with mainstream data formats. You can use standard SQL or Spark and Flink applications to query mainstream data formats without data ETL. DLI supports SQL statements and Spark applications for heterogeneous data sources, including RDS, GaussDB(DWS), CSS, OBS, custom databases on ECSs, and offline databases.

Functions

You can query and analyze heterogeneous data sources such as CloudTable, RDS, and GaussDB(DWS) on the cloud using access methods, such as visualized interface, RESTful API, JDBC, and Beeline. The data format is compatible with five mainstream data formats: CSV, JSON, Parquet, and ORC.

  • Basic functions
    • You can use standard SQL statements to query in SQL jobs. For details, see Spark SQL Syntax Reference.
    • Flink jobs support Flink SQL online analysis capabilities: supporting aggregation functions such as Window and Join, using SQL to express service logic, and achieving service implementation conveniently and quickly. For details, see Flink OpenSource SQL Syntax Reference.
    • For spark jobs, fully-managed Spark computing can be performed. You can submit computing tasks through interactive sessions or in batch to analyze data in the fully managed Spark queues. For details, see SQL Syntax Constraints and Definitions.
  • Federated analysis of heterogeneous data sources
    • Spark datasource connection: Data sources such as CloudTable, GaussDB(DWS), RDS, and CSS can be accessed through DLI. For details, see Enhanced Datasource Connections.
    • Interconnection with multiple cloud services is supported in Flink jobs to form a rich stream ecosystem. The DLI stream ecosystem consists of cloud service ecosystems and open source ecosystems.
      • Cloud service ecosystem: DLI can interconnect with other services in Flink SQL. You can directly use SQL to read and write data from cloud services, such as DIS, OBS, CloudTable, MRS, RDS, SMN and DCS.
      • Open-source ecosystem: By establishing network connections with other VPCs through enhanced datasource connections, you can access all Flink and Spark-supported data sources and output sources, such as Kafka, Hbase, Elasticsearch, in the tenant-authorized DLI queues.

      For details, see Flink Jobs.

  • Storage-compute decoupling

    DLI is interconnected with OBS for data analysis. In this architecture where storage and compute are decoupled, resources of these two types are charged separately, helping you reduce costs and improving resource utilization.

    You can choose single-AZ or multi-AZ storage when you create an OBS bucket for storing redundant data on the DLI console. The differences between the two storage policies are as follows:

    • Multi-AZ storage means data is stored in multiple AZs, improving data reliability. If the multi-AZ storage is enabled for a bucket, data is stored in multiple AZs in the same region. If one AZ becomes unavailable, data can still be properly accessed from the other AZs. The multi-AZ storage is ideal for scenarios that demand high reliability. You are advised to use this policy.
    • Single-AZ storage means that data is stored in a single AZ, with lower costs.
  • Elastic resource pool

    Elastic resource pools support the CCE cluster architecture for heterogeneous resources so you can centrally manage and allocate them. For details, see Elastic Resource Pool.

    Figure 1 Elastic resource pool architecture

    Elastic resource pools have the following advantages:

    • Unified management
      • You can manage multiple internal clusters and schedule jobs. You can manage millions of cores for compute resources.
      • Elastic resource pools can be deployed across multiple AZs to support high availability.
    • Tenant resource isolation

      Resources of different queues are isolated to reduce the impact on each other.

    • Shared access and flexibility
      • Minute-level scaling helps you to handle request peaks.
      • Queue priorities and CU quotas can be set at different time to improve resource utilization.
    • Job-level isolation (supported in later versions)

      SQL jobs can run on independent Spark instances, reducing mutual impacts between jobs.

    • Automatic scaling (supported in later versions)

      The queue quota is updated in real time based on workload and priority.

    Using elastic resource pools has the following advantages.

    Advantage

    No Elastic Resource Pool

    Use Elastic Resource Pool

    Efficiency

    You need to set scaling tasks repeatedly to improve the resource utilization.

    Dynamic scaling can be done in seconds.

    Resource utilization

    Resources cannot be shared among different queues.

    For example, a queue has idle CUs and another queue is heavily loaded. Resources cannot be shared. You can only scale up the second queue.

    Queues added to the same elastic resource pool can share compute resources.

    When you set a data source, you must allocate different network segments to each queue, which requires a large number of VPC network segments.

    You can add multiple general-purpose queues in the same elastic resource pool to one network segment, simplifying the data source configuration.

    Resource allocation

    If resources are insufficient for scale-out tasks of multiple queues, some queues will fail to be scaled out.

    You can set the priority for each queue in the elastic resource pool based on the peak hours to ensure proper resource allocation.

  • BI tool

    Interconnection with Yonghong BI for data analysis.

DLI Core Engine: Spark+Flink+Trino+HetuEngine

  • Spark is a unified analysis engine that is ideal for large-scale data processing. It focuses on query, compute, and analysis. DLI optimizes performance and reconstructs services based on open-source Spark. It is compatible with the Apache Spark ecosystem and interfaces, and improves performance by 2.5x when compared with open-source Spark. In this way, DLI enables you to perform query and analysis of EB's of data within hours.
  • Flink is a distributed compute engine that is ideal for batch processing, that is, for processing static data sets and historical data sets. You can also use it for stream processing, that is, processing real-time data streams and generating data results in real time. DLI enhances features and security based on the open-source Flink and provides the Stream SQL feature required for data processing.
  • HetuEngine is an open source SQL query engine that allows for interactive query and analysis. It excels in quickly and efficiently processing large-scale data queries and analyses with low latency.

Serverless Architecture

DLI is a serverless big data query and analysis service. It has the following advantages:

  • Pay-per-use: You pay only for what you use (scanned data volume/CUH packages). When no jobs are running, you will not be billed.
  • Auto scaling: DLI ensures you always have enough capacity on hand to deal with any traffic spikes.

Accessing DLI

A web-based service management platform is provided. You can access DLI using the management console or HTTPS-based APIs, or connect to the DLI server through the JDBC client.

  • Using the management console

    You can submit SQL, Spark, or Flink jobs on the DLI management console.

    Log in to the management console. Choose EI Enterprise Intelligence > Data Lake Insight from the service list.

Sitemizi ve deneyiminizi iyileştirmek için çerezleri kullanırız. Sitemizde tarama yapmaya devam ederek çerez politikamızı kabul etmiş olursunuz. Daha fazla bilgi edinin

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback