Updated on 2024-08-28 GMT+08:00

HetuEngine Basic Principles

HetuEngine Description

HetuEngine is an in-house high-performance, interactive SQL analysis and data virtualization engine. It seamlessly integrates with the big data ecosystem to implement interactive query of massive amounts of data within seconds, and supports cross-source and cross-domain unified data access to enable one-stop SQL convergence analysis in the data lake, between lakes, and between lakehouses.

HetuEngine Architecture

HetuEngine consists of different modules. Figure 1 shows the architecture.

Figure 1 HetuEngine architecture
Table 1 Module description

Module

Concept

Description

Cloud service layer

HetuEngine CLI/JDBC

HetuEngine client, through which the query request is submitted and the results is returned and displayed.

HSBroker

Service management component of HetuEngine. It manages and verifies compute instances, monitors health status, and performs automatic maintenance.

HSConsole

Provides visualized operation GUIs and RESTful APIs for data source information management, compute instance management, and automatic task query.

HSFabric

Provides high-performance and secure data transfer across domains (data centers).

Engine layer

Coordinator

Management node of HetuEngine compute instances. It receives and parses SQL statements, generates and optimizes execution plans, assigns tasks, and schedules resources.

Worker

Work node of HetuEngine compute instances. It provides capabilities such as parallel data pulling from data sources and distributed SQL computing.

HetuEngine Application Scenarios

HetuEngine supports cross-source (multiple data sources, such as Hive, HBase, GaussDB(DWS), and ClickHouse) and cross-domain (multiple regions or data centers) quick joint query, especially for interactive quick query of Hive and Hudi data in the Hadoop cluster (MRS).

Using the HetuEngine Cross-Source Function

Enterprises usually store massive data, such as from various databases and warehouses, for management and information collection. However, diversified data sources, hybrid dataset structures, and scattered data storage rise the development cost for cross-source query and prolong the cross-source query duration.

HetuEngine provides unified standard SQL statements to implement cross-source collaborative analysis, simplifying cross-source analysis operations.
Figure 2 HetuEngine cross-source function

Using the HetuEngine Cross-Domain Function

HetuEngine provide unified standard SQL to implement efficient access to multiple data sources distributed in multiple regions (or data centers), shields data differences in the structure, storage, and region, and decouples data and applications.
Figure 3 HetuEngine cross-region functions