Updated on 2024-06-21 GMT+08:00

Overview

In big data scenarios, data storage and resource consumption increase rapidly with the growth of data. The need for data may vary in different time periods, therefore, data is managed as hot and cold data, improving data analysis performance and reducing service costs.

Background

In real-world data analysis, hot and cold data have distinct query frequencies and speed needs. Storing all data locally as historical data grows can lead to resource wastage. The cold and hot data separation feature allows for cost-effective storage by segregating data onto different media types. This not only accelerates query speeds and responsiveness for hot data but also cuts down on storage costs for cold data. Additionally, this feature offers adaptable configurations to align with the specific demands of various service scenarios. Hot and cold data is classified based on the data access frequency and update frequency.

  • Hot Data: This type of data experiences frequent access and updates. It is likely to be needed in future operations and demands swift response times due to its active nature.
  • Cold Data: In contrast, cold data is characterized by its static state; it is rarely updated or accessed and has minimal requirements for response speed.

You can define cold and hot management tables to switch cold data that meets the specified rules to OBS for storage. Cold and hot data can be automatically determined and migrated by partition.

Principles

  • Table Creation: Create a data table with a cold-hot separation policy. The storage_policy must be set to hot_to_cold.
  • Data Writing: Upon importing new data, ClickHouse generates a new part for each write operation, ensuring that new data is allocated to the cold part seamlessly, thus maintaining concurrent cold and hot data storage.
  • Data Separation: Data initially resides in hot storage and transitions to cold storage based on capacity or elapsed time. ClickHouse's separation is part-based; when the threshold is reached, qualifying parts are transferred to OBS, and corresponding local data is deleted. Newly imported data, forming new parts, will also migrate to OBS upon reaching the set threshold.
  • Data Querying: Data queries initiated by users are directed to the root directories of corresponding buckets according to the table's storage policy. ClickHouse retrieves necessary data from various table parts to the local system for processing.