Updated on 2025-07-29 GMT+08:00

Multi-Level Downsampling

Application Scenarios

In DevOps or IoT scenarios, users focus on statistical metrics (such as maximum, minimum, and average values) instead of historical data details.

A conventional approach involves storing all granular data, querying it, and performing calculations on demand. If historical data is retained only for feature calculation purposes, the conventional approach presents several drawbacks: the longer granular data is retained, the higher the storage costs incurred.

Introduction to Multi-Level Downsampling

Multi-level downsampling policies are available to data across time ranges. For instance, data from the last week to month is aggregated every 15 minutes (regardless of aggregation methods), while historical data in the last month is aggregated every hour.

Users may be sensitive to recent data but have little requirements for long-term data, so different multi-level downsampling policies need to be used. Multi-level downsampling not only meets query requirements for high-value data, but also ensures storage efficiency.

For example, raw data in the last 7 days is directly imported to a database. Data from the last 7 to 30 days is downsampled every 15 minutes and then stored in a database. Data from the last 30 days to 12 months is downsampled every hour and then stored in a database. As shown in Figure 1, assume that today is December 31, 2022. The dark blue area on the left displays data within the last 7 days. The gray blue area in the middle displays data from the last 7 to 30 days. The light blue area on the right displays data from the last 30 days to 12 months. After a period of time, data in the dark blue area is aggregated into the gray blue area every 15 minutes, and data in the gray blue area is aggregated into the light blue area every hour. There are four 15-minute segments in an hour, so aggregation is available as a convenience to avoid extra implementation overhead.

Figure 1 Multi-level downsampling grid

Raw data from the last 7 to 30 days as well as 30 days to 12 months will not be retained after downsampling.

Creating a Downsampling Task

Syntax:

Create DownSample [on <rp_name>| on <dbname>.<rp_name>|  ]((dataType(aggregators)...)) With Duration <timeDuration> SampleInterval(time Durations) TimeInterval(time Durations)
Table 1 Parameters

Duration

SampleInterval

TimeInterval

Data retention period after downsampling

Next-level downsampling time

Sampling interval

Aggregation method definition:

dataType(aggfunctions...)

Example of an aggregation method:

integer(first,sum,count,last,min,max)
integer(min,max),float(sum)

For example:

Create a retention rule named rp1. Data is retained for seven days, and one shard is created every day.

create retention policy rp1 on mydb duration 7d replication 1 shard duration 1d 

To create a downsampling task based on rp1, set the retention period of the sampled data to seven days. Details about data of the last day are stored. Data of the last one to two days is sampled every minute. Data generated two days ago is downsampled every 3 minutes.

Create DownSample on rp1 (float(sum,last),integer(max,min)) With Duration 7d sampleinterval(1d,2d) timeinterval(1m,3m)
  • The second values of sampleinterval and timeinterval must be integer multiples of their first values.

    Valid example:

    • sampleinterval(1d,2d)
    • timeinterval(1m,3m)

    Invalid example:

    • sampleinterval(2d,3d)
    • timeinterval(5m,6m)
  • The number of values of sampleinterval and timeinterval must be consistent.

    Valid example:

    • sampleinterval(1d,2d) and timeinterval(1m,3m)

    Invalid example:

    • sampleinterval(1d,2d) and timeinterval(3m)
    • sampleinterval(1d) and timeinterval(1m,3m)
  • Duration controls the retention period of downsampled data. The value specified for rp1 is updated synchronously and can be the same as that specified for rp. Only first, last, sum, max, min, mean, and count are allowed.
  • float(sum,last) indicates that FLOAT data is downsampled using the sum() and last() functions.
  • The mechanism of integer(max,min) is similar to that of float(sum,last).

Showing Downsampling Tasks

Show all downsampling tasks of the default database.

SHOW DOWNSAMPLES 

Show all downsampling tasks of a specified database.

SHOW DOWNSAMPLES ON <database name>

For example:

> show downsamples on ExampleDatabase
+--------+-------------------------+----------+----------------+--------------+
| rpName |     field_operator      | duration | sampleInterval | timeInterval |
+--------+-------------------------+----------+----------------+--------------+
| rp1    | float{max},integer{min} | 24h0m0s  | 1h0m0s,2h0m0s  | 1m0s,5m0s    |
+--------+-------------------------+----------+----------------+--------------+

Deleting Downsampling Tasks

Delete all downsampling tasks of a specified database.

Drop DownSamples 
Drop DownSamples on db0 

Delete all downsampling tasks of a specified RP.

Drop DownSample on rp1 
Drop DownSample on db0.rp1