Help Center/ GaussDB(DWS)/ Stream Data Warehouse/ Functions and Expressions
Updated on 2023-03-30 GMT+08:00

Functions and Expressions

Time Series Calculation Functions

Table 1 Functions supported by time series calculation

Functionality

Function

Calculates the difference between two rows sorted by time.

delta

Calculates the difference between the maximum value and the minimum value in a specified period.

spread

Returns the value with the highest occurrence frequency for a given column. If multiple values have the same frequency, this function returns the smallest value among these values.

mode()

Calculates the percentile. Its result is an approximation to the result of percentile_cont.

value_of_percentile

Calculates a value based on a given percentile. This is the inverse function of value_of_percentile.

percentile_of_value

Compare the values in the column2, find the minimum value, and output the value both in the column1 and in the row of the minimum value.

first

Compare the values in the column2, find the maximum value, and output the value both in the column1 and in the row of the maximum value.

last

Obtains the number of rows in the tag table of the time series table on the current DN. This function can be used only on DNs.

get_timeline_count_internal

Obtains the number of rows in the tag table of the time series table on each DN. This function can be used only on CNs.

get_timeline_count

Deletes the useless data in the tagid row of the tag table.

gs_clean_tag_relation

Migrates partition management tasks of a time series table. It is used only when the time series table is upgraded along with the cluster upgrade from 8.1.1 to 8.1.3.

ts_table_part_policy_pgjob_to_pgtask

Migrates the partition management tasks of all time series tables in the database. It is used only when time series tables are upgraded along with the cluster upgrade from 8.1.1 to 8.1.3.

proc_part_policy_pgjob_to_pgtask

Prints SQL statements. Each statement is used to migrate the partition management tasks of a time series table. It is used only when time series table is upgraded from 8.1.1 to 8.1.3.

print_sql_part_policy_pgjob_to_pgtask

Table 2 Expressions for supplementing time information

Features

Expression

Aggregates data, sorts data by the time column, and supplements the missing time data by forward filling.

time_fill(interval, time_column, start_time, end_time), fill_last(agg_function(agg_column))

Aggregates data, sorts data by the time column, and supplements the missing time data by backward filling.

time_fill(interval, time_column, start_time, end_time), fill_first(agg_function(agg_column))

Aggregates data, sorts data by the time column, and supplements the missing time data by forward and backward filling.

time_fill(interval, time_column, start_time, end_time), fill_avg(agg_function(agg_column))

Table 3 Parameter description

Parameter

Type

Description

Required/Option

interval

INTERVAL. The smallest unit is second.

Interval grouped by time

Required

time_column

TIMESTAMP or TIMESTAMPTZ

Interval grouped by a specified column

Required

start_time

TIMESTAMP or TIMESTAMPTZ

Start time of a group

Required

end_time

TIMESTAMP or TIMESTAMPTZ

End time of a group

Required

agg_function(agg_column))

Aggregates specified columns. Example, max(col)

Fills the missing part in the agg result.

Required

  • The time_fill function needs to be used as an aggregation function. The GROUP BY clause needs to reference its own calculation result and cannot be nested with itself. The clause cannot be called multiple times in a single query, used as a lower-layer computing node, or used with the WITHIN GROUP clause.
  • The start timestamp must be smaller than the finish timestamp, and their interval must be greater than the value of window_width.
  • All parameters cannot be null. Values of start and finish must be specified.
  • time_fill must be used together with fill_avg, fill_first, fill_last, or the Agg function.
  • time_fill can only be used in GROUP BY. A GROUP BY clause that contains time_fill cannot contain other columns.
  • time_fill cannot be used after SELECT, for example, after WHERE or in other conditions.

Example:

Create a table and insert data.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
create table dcs_cpu(
idle real TSField,
vcpu_num int TSTag,
node text TSTag,
scope_name text TSTag,
server_ip text TSTag,
iowait numeric TSField,
time_string timestamp TSTime
)with (TTL='7 days', PERIOD = '1 day', orientation=timeseries) distribute by hash(node);
insert  into dcs_cpu VALUES(1.0,1,'node_a','scope_a','1.1.1.1',1.0,'2019-07-12 00:10:10');
insert  into dcs_cpu VALUES(2.0,2,'node_b','scope_a','1.1.1.2',2.0,'2019-07-12 00:12:10');
insert  into dcs_cpu VALUES(3.0,3,'node_c','scope_b','1.1.1.3',3.0,'2019-07-12 00:13:10');

Calculate the average value in a group, 1 minute as the unit. Use the value of the previous time segment to fill in the value of the next time segment.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
select time_fill(interval '1 min',time_string,'2019-07-12 00:09:00','2019-07-12 00:14:00'), fill_last(avg(idle)) from dcs_cpu group by time_fill order by time_fill;
time_fill                  | fill_last
--------------------------+-----------
Fri Jul 12 00:09:00 2019 |
Fri Jul 12 00:10:00 2019 |         1
Fri Jul 12 00:11:00 2019 |         1
Fri Jul 12 00:12:00 2019 |         2
Fri Jul 12 00:13:00 2019 |         3
Fri Jul 12 00:14:00 2019 |         3
(6 rows)

Calculate the average value in a group, 1 minute as the unit. Use the value of the next time segment to fill in the value of the previous time segment.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
select time_fill(interval '1 min',time_string,'2019-07-12 00:09:00','2019-07-12 00:14:00'), fill_first(avg(idle)) from dcs_cpu group by time_fill order by time_fill;
time_fill                  | fill_first
--------------------------+------------
Fri Jul 12 00:09:00 2019 |          1
Fri Jul 12 00:10:00 2019 |          1
Fri Jul 12 00:11:00 2019 |          2
Fri Jul 12 00:12:00 2019 |          2
Fri Jul 12 00:13:00 2019 |          3
Fri Jul 12 00:14:00 2019 |
(6 rows)

Calculate the average value in the group in the unit of 1 minute and fill the current value with the weighted average value of the two consecutive time segments.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
select time_fill(interval '1 min',time_string,'2019-07-12 00:09:00','2019-07-12 00:14:00'), fill_avg(avg(idle)) from dcs_cpu group by time_fill order by time_fill;
time_fill                  | fill_avg
--------------------------+----------
Fri Jul 12 00:09:00 2019 |        1
Fri Jul 12 00:10:00 2019 |        1
Fri Jul 12 00:11:00 2019 |      1.5
Fri Jul 12 00:12:00 2019 |        2
Fri Jul 12 00:13:00 2019 |        3
Fri Jul 12 00:14:00 2019 |        3
(6 rows)

delta(field numeric)

Calculates the difference between two rows sorted by time.

Table 4 Parameter description

Parameter

Type

Description

Required/Option

field

Numeric

Column to be calculated

Required

  • This function is used to calculate the interpolation between two adjacent rows sorted by time to monitor indicators such as traffic and speed.
  • The delta window function needs to be used together with the over window function. In addition, the rows statement in the over statement does not change the result of the delta function. For example, the results returned by delta(value) over(order by time rows 1 preceding) and delta(value) over(order by time rows 3 preceding) are the same.
Example:
SELECT
     delta(value) over (rows 1 preceding)
FROM 
     (VALUES ('2019-07-12 00:00:00'::timestamptz, 1),('2019-07-12 00:01:00'::timestamptz, 2),('2019-07-12 00:02:00'::timestamptz, 3)) v(time,value);

spread(field numeric)

Calculates the difference between the maximum value and the minimum value in a specified period.

Table 5 Parameter description

Parameter

Type

Description

Required/Option

field

Numeric

Column to be calculated

Required

  • This function is used to calculate the increment of each counter sorted by time.
  • If there are fewer than two tuples in each group, the returned result is 0. Do not use this function with the OVER window function.
Example:
SELECT
     SPREAD(value)
FROM 
     (VALUES ('2019-07-12 00:00:00'::timestamptz, 1),('2019-07-12 00:01:00'::timestamptz, 2),('2019-07-12 00:02:00'::timestamptz, 3)) v(time,value);

mode() within group (order by value anyelement)

Returns the value with the highest occurrence frequency for a given column. If multiple values have the same frequency, this function returns the smallest value among these values.

Table 6 Parameter description

Parameter

Type

Description

Required/Option

value

anyelement

Querying a column

Required

  • This function must be used together with the within group function. If the within group statement does not exist, an error is reported. This function parameter is placed after order by of the group.
  • This function cannot be used together with the over clause.
Example:
SELECT
    mode() within group (order by value)
FROM
    (VALUES ('2019-07-12 00:00:00'::timestamptz, 1),('2019-07-12 00:01:00'::timestamptz, 2),('2019-07-12 00:02:00'::timestamptz, 3)) v(time,value);

value_of_percentile(column float, percentile float, compression float)

Returns percentile values for a specified column in ascending order. It result is an approximation of percentile_cont, but the function achieves better performance than percentile_cont.

Table 7 Parameter description

Parameter

Type

Description

Required/Option

column

float

Column whose percentile is to be calculated

Required

percentile

float

Percentile value. Value range: 0 to 1

Required

compression

float

Specifies the compression coefficient. The value range is [0,500]. The default value is 300. A larger value indicates higher memory usage and higher result precision. If the specified value is not within the value range, the value is regarded as 300.

Required

Example:

SELECT value_of_percentile(values, 0.8, 0) from TABLE;

percentile_of_value(column float, percentilevalue float, compression float)

Returns percentiles in ascending order for a given column. This function is the inverse function of value_of_percentile.

Table 8 Parameter description

Parameter

Type

Description

Requried/Option

column

float

Column whose percentile is to be calculated

Required

percentilevalue

float

Value whose percentile is to be calculated

Required

compression

float

Specifies the compression coefficient. The value range is [0,500]. The default value is 300. A larger value indicates higher memory usage and higher result precision. If the specified value is not within the value range, the value is regarded as 300.

Required

Example:

SELECT percentile_of_value(values, 80, 0) from TABLE;

first(column1, column2)

Aggregate Functions Compare the values of column2 in a group, find the minimum value, and output the value of column1.

Table 9 Parameter description

Parameter

Type

Description

Requried/Option

column1

bigint/text/double/numeric

Output column

Required

column2

timestamp/timestamptz/numeric

Comparison column

Required

Example (the table definition and data in the time_fill expression is used):

Obtain the first idle value in time order in each group based on scope_name.

1
2
3
4
5
6
select first(idle, time_string) from dcs_cpu group by scope_name;
first
-------
1
3
(2 rows)

last(column1, column2)

Aggregate Functions Compare the values of column2 in a group, find the maximum value, and output the corresponding value of column1.

Table 10 Parameter description

Parameter

Type

Description

Requried/Option

column1

bigint/text/double/numeric

Output column

Required

column2

timestamp/timestamptz/numeric

Comparison column

Required

Example (the table definition and data in the time_fill expression is used):

Obtain the last idle value in time order in each group based on scope_name.

1
2
3
4
5
6
select last(idle, time_string) from dcs_cpu group by scope_name;
last
------
2
3
(2 rows)

get_timeline_count_internal(schema_name text, rel_name text)

Obtains the number of rows in the tag table of the time series table on the current DN. This function can be used only on DNs.

Parameter

Type

Description

Required/Option

schema_name

text

Name of the schema that the time series table belongs to

Required

rel_name

text

Name of a time series table

Required

Example:

Create a table and insert data.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
CREATE TABLE IF NOT EXISTS CPU(
scope_name text TSTag,
server_ip text TSTag,
group_path text TSTag,
time timestamptz TSTime,
idle numeric TSField
) with (orientation=TIMESERIES) distribute by hash(scope_name);
insert into CPU values('dcxtataetaeta','10.145.255.33','saetataetaeta','2020-04-07 17:12:09+08',  60639);
insert into CPU values('wrhtataetaeta','10.145.255.33','saetataetaeta','2020-04-07 17:12:09+08',  53311);
insert into CPU values('saetataetaeta','10.145.255.33','saetataetaeta','2020-04-07 17:12:09+08',  27101);
insert into CPU values('saetataetaeta','10.145.255.33','saetataetaeta','2020-04-07 17:12:09+08',  48005);

After data is transferred from the delta table to CU, connect to the DN and execute the following function:

1
2
3
4
5
select get_timeline_count_internal('public', 'cpu');
get_timeline_count_internal
-----------------------------
2
(1 row)

get_timeline_count(relname regclass)

Obtains the number of rows in the tag table of the time series table on each DN. This function can be used only on CNs.

Parameter

Type

Description

Required/Option

relname

regclass

Name of the time series table

Required

Example:

Table creation and data import are the same as those of the get_timeline_count_internal function. Connect to the CN and execute the function:

1
2
3
4
5
6
select get_timeline_count('cpu');
get_timeline_count
--------------------
(dn_1,2)
(dn_2,1)
(2 rows)

gs_clean_tag_relation(tagOid oid)

This function is used to delete useless data in the row corresponding to a tagid in a tag table. Data in the primary table is cleared during automatic partition elimination. However, if a tag table is used for a long time, there may be some obsolete data in it You can invoke this function to clear the row data in the tag table to improve the utilization rate of the tag table. The returned value is the number of rows that are successfully deleted from the tag table.

Parameter description

Parameter

Type

Description

Required/Option

tagOid

oid

Delete useless data from a specified tag table.

Required

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
CREATE TABLE IF NOT EXISTS CPU(
scope_name text TSTag,
server_ip text TSTag,
group_path text TSTag,
time timestamptz TSTime,
idle numeric TSField,
system numeric TSField,
util numeric TSField,
vcpu_num numeric TSField,
guest numeric TSField,
iowait numeric TSField,
users numeric TSField) with (orientation=TIMESERIES) distribute by hash(scope_name);
SELECT oid FROM PG_CLASS WHERE relname='cpu';
  oid
-------
 19099
(1 row)
SELECT gs_clean_tag_relation(19099);
 gs_clean_tag_relation
-----------------------
                     0
(1 row)

ts_table_part_policy_pgjob_to_pgtask(schemaName text, tableName text)

This function is used to migrate partition management tasks of a time series table. It is used only when the time series table is upgraded along with the cluster upgrade from 8.1.1 to 8.1.3. In version 8.1.1, the partition management tasks of time series tables are in the pg_jobs table, while in version 8.1.3, these tasks are in the pg_task table. Therefore, during the cluster upgrade from version 8.1.1 to version 8.1.3, the partition management tasks need to be migrated from pg_jobs to pg_task. This function migrates only the time series table partition management tasks. After the migration is complete, the status of the original tasks in pg_jobs table are changed to broken.

Parameter

Type

Description

Required/Option

schemaName

text

Name of the schema that the time series table belongs to

Required

tableName

text

Name of the time series table

Required

Example:

CALL ts_table_part_policy_pgjob_to_pgtask('public','cpu1');
WARNING:  The job on pg_jobs is migrated to pg_task, and the original job is broken, the job what is call proc_drop_partition('public.cpu1', interval '7 d'); , the job interval is interval '1 day'.
WARNING:  The job on pg_jobs is migrated to pg_task, and the original job is broken, the job what is call proc_add_partition('public.cpu1', interval '1 d'); , the job interval is interval '1 day'.
ts_table_part_policy_pgjob_to_pgtask

----------------------------------

(1 row)

proc_part_policy_pgjob_to_pgtask()

This function is used only when the time series table is upgraded along with the cluster upgrade from version 8.1.1 to version 8.1.3. It is used to migrate the partition management tasks of all time series tables in the 8.1.1 version database. This function traverses all time series tables in the database and checks whether the partition management tasks of a time series table are migrated. If the tasks are not migrated, the ts_table_part_policy_pgjob_to_pgtask function is invoked to migrate them. If the migration fails, the entire system is rolled back.

Example:

CALL proc_part_policy_pgjob_to_pgtask();
NOTICE:  find table, name is cpu1, namespace is public.
WARNING:  The job on pg_jobs is migrated to pg_task, and the original job is broken, the job what is call proc_drop_partition('public.cpu1', interval '7 d'); , the job interval is interval '1 day'.
CONTEXT:  SQL statement "call ts_table_part_policy_pgjob_to_pgtask('public', 'cpu1');"
PL/pgSQL function proc_part_policy_pgjob_to_pgtask() line 17 at EXECUTE statement
WARNING:  The job on pg_jobs is migrated to pg_task, and the original job is broken, the job what is call proc_add_partition('public.cpu1', interval '1 d'); , the job interval is interval '1 day'.
CONTEXT:  SQL statement "call ts_table_part_policy_pgjob_to_pgtask('public', 'cpu1');"
PL/pgSQL function proc_part_policy_pgjob_to_pgtask() line 17 at EXECUTE statement
NOTICE:  find table, name is cpu2, namespace is public.
WARNING:  The job on pg_jobs is migrated to pg_task, and the original job is broken, the job what is call proc_add_partition('public.cpu2', interval '1 d'); , the job interval is interval '1 day'.
CONTEXT:  SQL statement "call ts_table_part_policy_pgjob_to_pgtask('public', 'cpu2');"
PL/pgSQL function proc_part_policy_pgjob_to_pgtask() line 17 at EXECUTE statement
proc_part_policy_pgjob_to_pgtask
--------------------------------------

(1 row)

print_sql_part_policy_pgjob_to_pgtask()

This function is used only when the time series tables are upgraded along with the cluster upgrade from version 8.1.1 to version 8.1.3. This function is used to print SQL statements that can be used to migrate the partition management tasks of a time series table. The migration granularity of proc_part_policy_pgjob_to_pgtask function is at the database level. But you can manually execute the statements printed by the proc_part_policy_pgjob_to_pgtask function to implement the table-level migration granularity.

Example:

CALL print_sql_part_policy_pgjob_to_pgtask();
call ts_table_part_policy_pgjob_to_pgtask('public', 'cpu1');
call ts_table_part_policy_pgjob_to_pgtask('public', 'cpu2');
print_sql_part_policy_pgjob_to_pgtask
---------------------------------------

(1 row)