El contenido no se encuentra disponible en el idioma seleccionado. Estamos trabajando continuamente para agregar más idiomas. Gracias por su apoyo.

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Basic Principles

Updated on 2024-11-29 GMT+08:00

Introduction to Doris

Doris is a high-performance, real-time analytical database based on MPP architecture, known for its extreme speed and ease of use. It can return query results of mass data in sub-seconds and can support high-concurrency point queries and high-throughput complex analysis. All this makes Apache Doris an ideal tool for report analysis, ad-hoc query, unified data warehouse, and data lake query acceleration. On Doris, users can build various applications, such as user behavior analysis, AB test platform, log retrieval analysis, user portrait analysis, and order analysis.

Doris Architecture

The following figure shows the overall architecture of Doris. The frontend and backend nodes can be expanded horizontally and infinitely.

Figure 1 Doris architecture
Table 1 Description

Parameter

Description

MySQL Tools

Doris is highly compatible with MySQL syntax and is accessible by various client tools. It also supports standard SQL statements and interconnection with BI tools.

FE

Frontend nodes process user access requests, plan query parsing, and manage metadata and nodes.

BE

Stores data, executes query plans, and balances load among copies.

Leader

Leader is a role elected from the Follower group.

Follower

A metadata log needs to be written successfully in most Follower nodes.

Doris uses the MPP model. Inter-node and intra-node parallel execution is used, which is applicable to distributed join of multiple large tables.

Supports vectorized query engines, AQE ( Adaptive Query Execution ) technology, CBO-RBO optimization policies, and hot data cache query.

Basic Concepts of Doris

In Doris, data is logically described in the form of tables.

  • Row&Column

    A table consists of rows and columns.

    • Row: a row of user data.
    • Column: describes different fields in a row of data.

    Columns can be classified into two types: key and value. From the service perspective, Key and Value can correspond to dimension columns and metric columns, respectively. From the perspective of the aggregation model, rows with the same Key column are aggregated into one row. The aggregation mode of the Value column is specified when the table is created.

  • Tablet&Partition

    In the Doris storage engine, user data is horizontally divided into several data shards (tablets, also called data buckets). Each tablet contains several rows of data. The data between the individual tablets does not intersect and is physically stored independently.

    Multiple tables logically belong to different partitions. A table belongs to only one partition, but a partition contains multiple tables. Since the tablets are physically stored independently, the partitions can be seen as physically independent, too. Tablet is the smallest physical storage unit for data operations such as movement and replication.

    Multiple partitions form a table. A partition can be regarded as the smallest logical management unit. Data can be imported or deleted only for one partition.

  • Data Models

    Doris data models are classified into three types: Aggregate, Unique, and Duplicate.

    • Aggregate models

      When data is imported, rows with the same Key column are aggregated into one row, and the Value column is aggregated based on the configured AggregationType. Currently, AggregationType has the following four aggregation modes:

      • SUM: Accumulate the values in multiple rows.
      • REPLACE: The newly imported value replaces the previous value.
      • MAX: Keep the maximum value.
      • MIN: Keep the minimum value.
    • Unique model

      In some multi-dimensional analysis scenarios, users are highly concerned about how to create uniqueness constraints for the Primary Key. Therefore, the Unique data model is introduced.

      • Read-on-read combination

        The read-on-read combination of the Unique model can be replaced by the Replace mode of the Aggregate model. The internal implementation mode and data storage mode are the same.

      • Merge-on-write

        Different from the Aggregate model, the query performance of the Unique model is closer to that of the Duplicate model. Compared with the Aggregate model, the Unique model has great advantages in query performance in scenarios where primary key constraints are required, especially in aggregation queries and queries that need to filter a large amount of data using indexes.

        In a Unique table where merge-on-write is enabled, overwritten and updated data is marked and deleted during data import, and new data is written to a new file. During query, all data marked for deletion is filtered out at the file level, and the read data is the latest data. This eliminates the data aggregation process in read-time combination and supports pushdown of multiple predicates in many cases. Therefore, the performance can be greatly improved in many scenarios, especially in the case of aggregation query.

    • Duplicate model

      In some multi-dimensional analysis scenarios, primary keys and data aggregation are not required. Duplicate data models can be introduced to meet such requirements.

      Different from the AGGREGATE KEY and UNIQUE KEY models, the DUPLICATE KEY model stores the data as they are and executes no aggregation. Even if there are two identical rows of data, they will both be retained. The DUPLICATE KEY in the CREATE TABLE statement is only used to specify based on which columns the data are sorted.

    • Suggestions on data model selection

      The data model is determined when the table is created and cannot be modified. Therefore, it is important to select a proper data model.

      • The AGGREGATE KEY model aggregates data in advance, greatly reducing data scanning and calculation workload. Therefore, it is suitable for reporting query business, which has fixed schema. However, this model is not user-friendly for count(*) queries. In addition, because the aggregation function in Value columns is fixed, semantic correctness needs to be considered when aggregation queries using other functions are performed.
      • The Unique model ensures that the primary key is unique in scenarios where a unique primary key constraint is required. However, the query advantages brought by pre-aggregation such as ROLLUP cannot be used.
        • For users who have high performance requirements for aggregation query, you are advised to use the write-on-write combination added in version 1.2.
        • The Unique model supports only the update of an entire row. If you need to update both the unique primary key constraint and some columns (for example, importing multiple source tables to one Doris table), you can use the Aggregate model and set the aggregation type of non-primary key columns to REPLACE_IF_NOT_NULL.
        • Duplicate is suitable for ad-hoc query in any dimension. Although the pre-aggregation feature cannot be used, it is not restricted by the aggregation model and can make full use of the advantages of the column-store model (only related columns are read, and all key columns do not need to be read).

Utilizamos cookies para mejorar nuestro sitio y tu experiencia. Al continuar navegando en nuestro sitio, tú aceptas nuestra política de cookies. Descubre más

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback