GeminiDB's Functions for Efficient Data Analysis
Application Scenarios
In IoV and IoT scenarios, a large amount of data needs to be collected frequently. Service personnel have been seeking a better way to analyze data faster. Traditionally, queried data is analyzed at the service layer. The design of a service's functionality and behavior must meet stringent criteria and a large number of service resources will be consumed, so the traditional solution does not fit the need of analyzing massive amount of data. GeminiDB Influx API is a time series database that can efficiently process massive data write and analysis requests. It provides various advanced data analysis functions, making data analysis efficient and convenient. This section uses histogram functions to demonstrate GeminiDB's advantages in advanced data analysis.
Use Cases
In statistics, a histogram is a valuable tool for gaining insights into data characteristics. It is widely used in many scenarios:
- Network monitoring: Histograms clearly show abnormal data distributions, facilitating network self-diagnosis and recovery.
- IoT data analysis: Histograms reveal data distribution characteristics, facilitating identification of time series data.
GeminiDB Influx API supports two types of histograms: equi-height and bounded histograms. Each bucket in an equi-height histogram contains roughly the same number of values. Values in each bucket have both upper and lower limits in a bounded histogram. Custom bounded histograms can be created by users to identify key distribution characteristics of data. Integers, floating-point data, strings, and Boolean values can meet data analysis requirements in all sectors.
Using Histograms for GeminiDB Influx API
Example
In the example, mst is the table name. There are four fields of different data types and two tags. The raw data is as follows:
> select * from mst name: mst time address age alive country height name ---- ------- --- ----- ------- ------ ---- 1629129600000000000 shenzhen 12 true china 70 azhu 1629129601000000000 shanghai 20 false american 80 alan 1629129602000000000 beijin 3 true germany 90 alang 1629129603000000000 guangzhou 30 false japan 121 ahui 1629129604000000000 chengdu 35 true canada 138 aqiu 1629129605000000000 wuhan 48 china 149 agang 1629129606000000000 52 true american 153 agan 1629129607000000000 anhui 28 false germany alin 1629129608000000000 xian true japan 179 ali 1629129609000000000 hangzhou 60 false canada 180 1629129610000000000 nanjin 102 true 191 ahuang 1629129611000000000 zhengzhou 123 false china 203 ayin
Equi-height histogram
Query syntax:
SELECT HISTOGRAM([ * | <field_key> | /<regular_expression>/ ], <N>) [WINTO_clause] FROM_clause [WHERE_clause] [GROUP_BY_clause] [ORDER_BY_clause] [LIMIT_clause] [OFFSET_clause] [SLIMIT_clause] [SOFFSET_clause] HISTOGRAM(field_key, N) generates a histogram visualization of the data distribution for field_key, with N representing the desired number of bins in the histogram. HISTOGRAM(/regular_expression/, N) calculates values of the field that matches the regular expression in each bin. HISTOGRAM(*, N) calculates values of integer and floating-point fields in each bin.
Example
- Query an equi-height histogram in which field_key is specified as age and N as 5.
> select histogram(age, 5) from mst where time >= 1629129600000000000 and time <= 1629129611000000000 name: mst time histogram value ---- --------- ----- 0 20 3 0 30 2 0 48 2 0 60 2 0 9223372036854775807 2
- Query an equi-height histogram in which field_key matches regular expression /hei/ (heights only) and N is specified as 5.
> select histogram(/hei/, 5) from mst where time >= 1629129600000000000 and time <= 1629129611000000000 name: mst time histogram_height value ---- ---------------- ----- 0 90 3 0 138 2 0 153 2 0 180 2 0 1.7976931348623157e+308 2
- Query an equi-height histogram with addresses of the string type and an equi-height histogram with the alive field of the Boolean type.
> select histogram(address, 5) from mst where time >= 1629129600000000000 and time <= 1629129611000000000 name: mst time histogram value ---- --------- ----- 1629129600000000000 chengdu 3 1629129600000000000 hangzhou 2 1629129600000000000 shanghai 2 1629129600000000000 wuhan 2 1629129600000000000 zhengzhou 2 > select histogram(alive, 3) from mst where time >= 1629129600000000000 and time <= 1629129611000000000 name: mst time histogram value ---- --------- ----- 1629129600000000000 false 5 1629129600000000000 true 6
Bounded histogram
Query syntax:
SELECT HISTOGRAM([ * | <field_key> | /<regular_expression>/ ], 'specifyBins', boundary1, boundary2, ..., boundaryN) [WINTO_clause] FROM_clause [WHERE_clause] [GROUP_BY_clause] [ORDER_BY_clause] [LIMIT_clause] [OFFSET_clause] [SLIMIT_clause] [SOFFSET_clause] HISTOGRAM (field_key, 'specifyBins', boundary1, boundary2,..., boundaryN) calculates values of a specified metric field in a specified bin. specifyBins indicates a flag and boundaryN the specified boundary value. Integers, floating-point data, strings, and Boolean values are supported. HISTOGRAM(/regular_expression/, 'specifyBins', boundary1, boundary2, ..., boundaryN) calculates values of the field that matches the regular expression in each bin. HISTOGRAM (*,'specifyBins', boundary1, boundary2,..., boundaryN) calculates values of integer and floating-point fields in the specified bin.
Example
- Query a bounded histogram in which field_key is specified as age with the following bins: [0, 10), [10, 20), [20, 30), [30, 40), and [40, 50)
> select histogram(age, 'specifyBins', 10, 20, 30, 40, 50) from mst name: mst time histogram value ---- --------- ----- 0 10 1 0 20 2 0 30 2 0 40 1 0 50 1
- Query a bounded histogram in which field_key matches regular expression /eight/ (heights only) in the following bins [0, 160), [160, 170), [170, 180), [180, 190), [190, 200):
> select histogram(/eight/, 'specifyBins', 160.0, 170.0, 180.0, 190.0, 200.0) from mst name: mst time histogram value ---- --------- ----- 0 160 7 0 170 0 0 180 2 0 190 0 0 200 1
- Query a bounded histogram with addresses of the string type and a bounded histogram with the alive field of the Boolean type.
> select histogram(address, 'specifyBins', 'anhui', 'beijin', 'chengdu') from mst name: mst time histogram value ---- --------- ----- 0 anhui 1 0 beijin 1 0 chengdu 1 # Boolean type > select histogram(alive, 'specifyBins', false, true) from mst name: mst time histogram value ---- --------- ----- 0 false 5 0 true 6
In addition to histogram functions, GeminiDB supports multiple other types of advanced analysis functions. For details, see Supported Commands.
Advantages
GeminiDB's advanced functions can be used to quickly analyze data and directly generate results. Databases handle the massive amount of data, eliminating the need for processing logic at the service layer. Using such functions will simplify your service design and reduce unnecessary resource consumption.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot