Updated on 2024-08-20 GMT+08:00

Advantages

Full SQL Compatibility

You do not need a background in big data to use DLI for data analysis. You only need to know SQL, and you are good to go. The SQL syntax is fully compatible with the standard ANSI SQL 2003.

Decoupled Storage and Compute

DLI compute and storage loads are decoupled. This architecture allows you to flexibly configure storage and compute resources on demand, improving resource utilization and reducing costs.

Enterprise Multi-Tenancy

You can manage compute or resource related permissions by project or user, and implement fine-grained control to isolate data for each task.

Serverless DLI

DLI is fully compatible with Apache Spark and Apache Flink ecosystems and APIs. It is a serverless big data computing and analysis service that integrates real-time, offline, and interactive analysis. Offline applications can be seamlessly migrated to the cloud, reducing the migration workload. DLI provides a highly-scalable framework integrating batch and stream processing, allowing you to handle data analysis requests with ease. With a deeply optimized kernel and architecture, DLI delivers 100-fold performance improvement compared with the MapReduce model. Your analysis is backed by an industry-vetted 99.95% SLA.

Figure 1 DLI serverless architecture

DLI has the following advantages over self-built Hadoop clusters:

Table 1 Advantages comparison

Advantage

Dimension

Data Lake Insight

Self-built Hadoop

Low cost

Capital cost

Billing is based on the actual amount of data scanned or used CUH. Saving up to 50% costs.

Long-term resource occupation, causing severe resource waste and high costs

Elastic scalability

Container-based Kubernetes, intelligent elastic scaling

Not supported.

O&M free

O&M cost

Out-of-the-box, serverless architecture

Strong technical capabilities are required for configuration and O&M

High availability

Cross-AZ DR

N/A

Easy to use

Learning cost

Low. The optimization parameters are standardized based on 10 years' experience in thousands of projects. In addition, DLI provides a GUI for intelligent optimization.

High. Hundreds of tuning parameters need to be learned.

Supported data sources

  • Cloud: OBS, RDS, GaussDB(DWS), CSS, MongoDB, and Redis
  • On-premises: self-built databases, MongoDB, and Redis
  • Cloud: OBS
  • On-premises: HDFS

Ecosystem compatibility

DLV, Yonghong BI, and Fanruan BI

Big data ecosystem tool

Custom image

Supported. Dependencies can be added as required to meet service diversity requirements.

Not supported.

Workflow scheduling

Scheduling between Data Lake Factory (DLF) and DataArts Studio

Self-built scheduling tools, such as Airflow

Multiple enterprise-level tenants

Table-based permission management, providing column level permission granularity.

File-based permission management

High performance

Performance

Higher performance with in-depth software and hardware optimization

Performance is the same as that of Hadoop open-source versions

Cross-Source Analysis

Analyze your data across databases. No migration required. A unified view of your data gives you a comprehensive understanding of your data and helps you innovate faster. There are no restrictions on data formats, cloud data sources, or whether the database is created online or off.