Help Center/ GeminiDB/ GeminiDB Influx API/ Best Practices/ GeminiDB's Functions for Efficient Data Analysis

Updated on 2025-07-29 GMT+08:00

View PDF

GeminiDB's Functions for Efficient Data Analysis

Application Scenarios

In IoV and IoT scenarios, a large amount of data needs to be collected frequently. Service personnel have been seeking a better way to analyze data faster. Traditionally, queried data is analyzed at the service layer. The design of a service's functionality and behavior must meet stringent criteria and a large number of service resources will be consumed, so the traditional solution does not fit the need of analyzing massive amount of data. GeminiDB Influx API is a time series database that can efficiently process massive data write and analysis requests. It provides various advanced data analysis functions, making data analysis efficient and convenient. This section uses histogram functions to demonstrate GeminiDB's advantages in advanced data analysis.

Use Cases

In statistics, a histogram is a valuable tool for gaining insights into data characteristics. It is widely used in many scenarios:

Network monitoring: Histograms clearly show abnormal data distributions, facilitating network self-diagnosis and recovery.
IoT data analysis: Histograms reveal data distribution characteristics, facilitating identification of time series data.

GeminiDB Influx API supports two types of histograms: equi-height and bounded histograms. Each bucket in an equi-height histogram contains roughly the same number of values. Values in each bucket have both upper and lower limits in a bounded histogram. Custom bounded histograms can be created by users to identify key distribution characteristics of data. Integers, floating-point data, strings, and Boolean values can meet data analysis requirements in all sectors.

Using Histograms for GeminiDB Influx API

Example

In the example, mst is the table name. There are four fields of different data types and two tags. The raw data is as follows:

> select * from mst
name: mst
time                address   age alive country  height name
----                -------   --- ----- -------  ------ ----
1629129600000000000 shenzhen  12  true  china    70     azhu
1629129601000000000 shanghai  20  false american 80     alan
1629129602000000000 beijin    3   true  germany  90     alang
1629129603000000000 guangzhou 30  false japan    121    ahui
1629129604000000000 chengdu   35  true  canada   138    aqiu
1629129605000000000 wuhan     48        china    149    agang
1629129606000000000           52  true  american 153    agan
1629129607000000000 anhui     28  false germany         alin
1629129608000000000 xian          true  japan    179    ali
1629129609000000000 hangzhou  60  false canada   180    
1629129610000000000 nanjin    102 true           191    ahuang
1629129611000000000 zhengzhou 123 false china    203    ayin

Equi-height histogram

Query syntax:

SELECT HISTOGRAM([ * | <field_key> | /<regular_expression>/ ], <N>) [WINTO_clause] FROM_clause [WHERE_clause] [GROUP_BY_clause] [ORDER_BY_clause] [LIMIT_clause] [OFFSET_clause]  [SLIMIT_clause] [SOFFSET_clause]
HISTOGRAM(field_key, N) generates a histogram visualization of the data distribution for field_key, with N representing the desired number of bins in the histogram.
HISTOGRAM(/regular_expression/, N) calculates values of the field that matches the regular expression in each bin.
HISTOGRAM(*, N) calculates values of integer and floating-point fields in each bin.

Example

Query an equi-height histogram in which field_key is specified as age and N as 5.

> select histogram(age, 5) from mst where time >= 1629129600000000000 and time <= 1629129611000000000
name: mst
time histogram           value
---- ---------           -----
0    20                  3
0    30                  2
0    48                  2
0    60                  2
0    9223372036854775807 2

Query an equi-height histogram in which field_key matches regular expression /hei/ (heights only) and N is specified as 5.

> select histogram(/hei/, 5) from mst where time >= 1629129600000000000 and time <= 1629129611000000000
name: mst
time histogram_height         value
---- ----------------         -----
0    90                       3
0    138                      2
0    153                      2
0    180                      2
0    1.7976931348623157e+308  2

Query an equi-height histogram with addresses of the string type and an equi-height histogram with the alive field of the Boolean type.

> select histogram(address, 5) from mst where time >= 1629129600000000000 and time <= 1629129611000000000
name: mst
time                histogram value
----                --------- -----
1629129600000000000 chengdu   3
1629129600000000000 hangzhou  2
1629129600000000000 shanghai  2
1629129600000000000 wuhan     2
1629129600000000000 zhengzhou 2
 
> select histogram(alive, 3) from mst where time >= 1629129600000000000 and time <= 1629129611000000000
name: mst
time                histogram  value
----                ---------  -----
1629129600000000000 false      5
1629129600000000000 true       6

Bounded histogram

Query syntax:

SELECT HISTOGRAM([ * | <field_key> | /<regular_expression>/ ], 'specifyBins', boundary1, boundary2, ..., boundaryN) [WINTO_clause] FROM_clause [WHERE_clause] [GROUP_BY_clause] [ORDER_BY_clause] [LIMIT_clause] [OFFSET_clause] [SLIMIT_clause] [SOFFSET_clause]
 
HISTOGRAM (field_key, 'specifyBins', boundary1, boundary2,..., boundaryN) calculates values of a specified metric field in a specified bin. specifyBins indicates a flag and boundaryN the specified boundary value.
Integers, floating-point data, strings, and Boolean values are supported.
HISTOGRAM(/regular_expression/, 'specifyBins',  boundary1, boundary2, ..., boundaryN) calculates values of the field that matches the regular expression in each bin.
 
HISTOGRAM (*,'specifyBins', boundary1, boundary2,..., boundaryN) calculates values of integer and floating-point fields in the specified bin.

Example

Query a bounded histogram in which field_key is specified as age with the following bins: [0, 10), [10, 20), [20, 30), [30, 40), and [40, 50)

> select histogram(age, 'specifyBins', 10, 20, 30, 40, 50) from mst
name: mst
time histogram value
---- --------- -----
0    10        1
0    20        2
0    30        2
0    40        1
0    50        1

Query a bounded histogram in which field_key matches regular expression /eight/ (heights only) in the following bins [0, 160), [160, 170), [170, 180), [180, 190), [190, 200):

> select histogram(/eight/, 'specifyBins', 160.0, 170.0, 180.0, 190.0, 200.0) from mst
name: mst
time histogram value
---- --------- -----
0    160       7
0    170       0
0    180       2
0    190       0
0    200       1

Query a bounded histogram with addresses of the string type and a bounded histogram with the alive field of the Boolean type.

> select histogram(address, 'specifyBins', 'anhui', 'beijin', 'chengdu') from mst
name: mst
time histogram value
---- --------- -----
0    anhui     1
0    beijin    1
0    chengdu   1
# Boolean type
> select histogram(alive, 'specifyBins', false, true) from mst
name: mst
time histogram value
---- --------- -----
0    false     5
0    true      6

In addition to histogram functions, GeminiDB supports multiple other types of advanced analysis functions. For details, see Supported Commands.

Advantages

GeminiDB's advanced functions can be used to quickly analyze data and directly generate results. Databases handle the massive amount of data, eliminating the need for processing logic at the service layer. Using such functions will simplify your service design and reduce unnecessary resource consumption.

Parent Topic: Best Practices

Previous topic: GeminiDB Time Series IoV Solution

Next topic: Multi-Level Downsampling