Updated on 2024-08-10 GMT+08:00

ClickHouse Application Development Overview

Introduction to ClickHouse

ClickHouse is a column-oriented database for online analytical processing. It supports SQL query and provides good query performance. The aggregation analysis and query performance based on large and wide tables is excellent, which is one order of magnitude faster than other analytical databases.

Advantages:

  • High data compression ratio
  • Multi-core parallel computing
  • Vectorized computing engine
  • Supporting nested data structure
  • Supporting sparse indexes
  • Supporting INSERT and UPDATE

Application scenarios:

  • Real-time data warehouse

    The streaming computing engine (such as Flink) is used to write real-time data to ClickHouse. With the excellent query performance of ClickHouse, multi-dimensional and multi-mode real-time query and analysis requests can be responded within subseconds.

  • Offline query

    Large-scale service data is imported to ClickHouse to construct a large wide table with hundreds of millions to tens of billions of records and hundreds of dimensions. It supports personalized statistics collection and continuous exploratory query and analysis at any time to assist business decision-making and provide excellent query experience.

Introduction to the ClickHouse Development Interface

ClickHouse is developed using C++ and positioned as a DBMS. It supports HTTP and Native TCP network interface protocols and multiple driver modes such as JDBC and ODBC. clickhouse-jdbc of the community edition is recommended for application development.

Concepts

  • Cluster

    Cluster is a logical concept in ClickHouse. It can be defined by users as required, which is different from the general understanding of cluster. Multiple ClickHouse nodes are loosely coupled and deployed independently.

  • Shard

    A shard is a horizontal division of a cluster. A cluster can consist of multiple shards.

  • Replica

    Multiple replicas can be created for one shard.

  • Partition

    Partitions vertically divide local replicas into different parts.

  • MergeTree

    ClickHouse has a huge table engine system. As the basic table engine of the family system, MergeTree provides functions such as data partitioning, primary indexes, and secondary indexes. When creating a table, you need to specify the table engine. Different table engines determine the characteristics of a data table, for example, the features of a data table, in what format data is stored, and how data is loaded.