Help Center> Data Warehouse Service (DWS)> Stream Data Warehouse> Functions and Expressions

Updated on 2024-05-07 GMT+08:00

View PDF

Functions and Expressions

Time Series Calculation Functions

**Table 1** Functions supported by time series calculation
Functionality	Function
Calculates the difference between two rows sorted by time.	delta
Calculates the difference between the maximum value and the minimum value in a specified period.	spread
Returns the value with the highest occurrence frequency for a given column. If multiple values have the same frequency, this function returns the smallest value among these values.	mode()
Calculates the percentile. Its result is an approximation to the result of percentile_cont.	value_of_percentile
Calculates a value based on a given percentile. This is the inverse function of value_of_percentile.	percentile_of_value
Compare the values in the column2, find the minimum value, and output the value both in the column1 and in the row of the minimum value.	first
Compare the values in the column2, find the maximum value, and output the value both in the column1 and in the row of the maximum value.	last
Obtains the number of rows in the tag table of the time series table on the current DN. This function can be used only on DNs.	get_timeline_count_internal
Obtains the number of rows in the tag table of the time series table on each DN. This function can be used only on CNs.	get_timeline_count
Deletes the useless data in the tagid row of the tag table.	gs_clean_tag_relation
Migrates partition management tasks of a time series table. It is used only when the time series table is upgraded along with the cluster upgrade from 8.1.1 to 8.1.3.	ts_table_part_policy_pgjob_to_pgtask
Migrates the partition management tasks of all time series tables in the database. It is used only when time series tables are upgraded along with the cluster upgrade from 8.1.1 to 8.1.3.	proc_part_policy_pgjob_to_pgtask
Prints SQL statements. Each statement is used to migrate the partition management tasks of a time series table. It is used only when time series table is upgraded from 8.1.1 to 8.1.3.	print_sql_part_policy_pgjob_to_pgtask

**Table 2** Expressions for supplementing time information
Features	Expression
Aggregates data, sorts data by the time column, and supplements the missing time data by forward filling.	time_fill(interval, time_column, start_time, end_time), fill_last(agg_function(agg_column))
Aggregates data, sorts data by the time column, and supplements the missing time data by backward filling.	time_fill(interval, time_column, start_time, end_time), fill_first(agg_function(agg_column))
Aggregates data, sorts data by the time column, and supplements the missing time data by forward and backward filling.	time_fill(interval, time_column, start_time, end_time), fill_avg(agg_function(agg_column))

**Table 3** Parameter description
Parameter	Type	Description	Required/Option
interval	INTERVAL. The smallest unit is second.	Interval grouped by time	Required
time_column	TIMESTAMP or TIMESTAMPTZ	Interval grouped by a specified column	Required
start_time	TIMESTAMP or TIMESTAMPTZ	Start time of a group	Required
end_time	TIMESTAMP or TIMESTAMPTZ	End time of a group	Required
agg_function(agg_column))	Aggregates specified columns. Example, max(col)	Fills the missing part in the agg result.	Required

The time_fill function needs to be used as an aggregation function. The GROUP BY clause needs to reference its own calculation result and cannot be nested with itself. The clause cannot be called multiple times in a single query, used as a lower-layer computing node, or used with the WITHIN GROUP clause.
The start timestamp must be smaller than the finish timestamp, and their interval must be greater than the value of window_width.
All parameters cannot be null. Values of start and finish must be specified.
time_fill must be used together with fill_avg, fill_first, fill_last, or the Agg function.
time_fill can only be used in GROUP BY. A GROUP BY clause that contains time_fill cannot contain other columns.
time_fill cannot be used after SELECT, for example, after WHERE or in other conditions.

Example:

Create a table and insert data.

   
        create table dcs_cpu(
idle real TSField,
vcpu_num int TSTag,
node text TSTag,
scope_name text TSTag,
server_ip text TSTag,
iowait numeric TSField,
time_string timestamp TSTime
)with (TTL='7 days', PERIOD = '1 day', orientation=timeseries) distribute by hash(node);
insert  into dcs_cpu VALUES(1.0,1,'node_a','scope_a','1.1.1.1',1.0,'2019-07-12 00:10:10');
insert  into dcs_cpu VALUES(2.0,2,'node_b','scope_a','1.1.1.2',2.0,'2019-07-12 00:12:10');
insert  into dcs_cpu VALUES(3.0,3,'node_c','scope_b','1.1.1.3',3.0,'2019-07-12 00:13:10');

Calculate the average value in a group, 1 minute as the unit. Use the value of the previous time segment to fill in the value of the next time segment.

   
        select time_fill(interval '1 min',time_string,'2019-07-12 00:09:00','2019-07-12 00:14:00'), fill_last(avg(idle)) from dcs_cpu group by time_fill order by time_fill;
time_fill                  | fill_last
--------------------------+-----------
Fri Jul 12 00:09:00 2019 |
Fri Jul 12 00:10:00 2019 |         1
Fri Jul 12 00:11:00 2019 |         1
Fri Jul 12 00:12:00 2019 |         2
Fri Jul 12 00:13:00 2019 |         3
Fri Jul 12 00:14:00 2019 |         3
(6 rows)

Calculate the average value in a group, 1 minute as the unit. Use the value of the next time segment to fill in the value of the previous time segment.

   
        select time_fill(interval '1 min',time_string,'2019-07-12 00:09:00','2019-07-12 00:14:00'), fill_first(avg(idle)) from dcs_cpu group by time_fill order by time_fill;
time_fill                  | fill_first
--------------------------+------------
Fri Jul 12 00:09:00 2019 |          1
Fri Jul 12 00:10:00 2019 |          1
Fri Jul 12 00:11:00 2019 |          2
Fri Jul 12 00:12:00 2019 |          2
Fri Jul 12 00:13:00 2019 |          3
Fri Jul 12 00:14:00 2019 |
(6 rows)

Calculate the average value in the group in the unit of 1 minute and fill the current value with the weighted average value of the two consecutive time segments.

   
        select time_fill(interval '1 min',time_string,'2019-07-12 00:09:00','2019-07-12 00:14:00'), fill_avg(avg(idle)) from dcs_cpu group by time_fill order by time_fill;
time_fill                  | fill_avg
--------------------------+----------
Fri Jul 12 00:09:00 2019 |        1
Fri Jul 12 00:10:00 2019 |        1
Fri Jul 12 00:11:00 2019 |      1.5
Fri Jul 12 00:12:00 2019 |        2
Fri Jul 12 00:13:00 2019 |        3
Fri Jul 12 00:14:00 2019 |        3
(6 rows)

delta(field numeric)

Calculates the difference between two rows sorted by time.

**Table 4** Parameter description
Parameter	Type	Description	Required/Option
field	Numeric	Column to be calculated	Required

This function is used to calculate the interpolation between two adjacent rows sorted by time to monitor indicators such as traffic and speed.
The delta window function needs to be used together with the over window function. In addition, the rows statement in the over statement does not change the result of the delta function. For example, the results returned by delta(value) over(order by time rows 1 preceding) and delta(value) over(order by time rows 3 preceding) are the same.

Example:

SELECT
     delta(value) over (rows 1 preceding)
FROM 
     (VALUES ('2019-07-12 00:00:00'::timestamptz, 1),('2019-07-12 00:01:00'::timestamptz, 2),('2019-07-12 00:02:00'::timestamptz, 3)) v(time,value);

spread(field numeric)

Calculates the difference between the maximum value and the minimum value in a specified period.

**Table 5** Parameter description
Parameter	Type	Description	Required/Option
field	Numeric	Column to be calculated	Required

This function is used to calculate the increment of each counter sorted by time.
If there are fewer than two tuples in each group, the returned result is 0. Do not use this function with the OVER window function.

Example:

SELECT
     SPREAD(value)
FROM 
     (VALUES ('2019-07-12 00:00:00'::timestamptz, 1),('2019-07-12 00:01:00'::timestamptz, 2),('2019-07-12 00:02:00'::timestamptz, 3)) v(time,value);

mode() within group (order by value anyelement)

Returns the value with the highest occurrence frequency for a given column. If multiple values have the same frequency, this function returns the smallest value among these values.

**Table 6** Parameter description
Parameter	Type	Description	Required/Option
value	anyelement	Querying a column	Required

This function must be used together with the within group function. If the within group statement does not exist, an error is reported. This function parameter is placed after order by of the group.
This function cannot be used together with the over clause.

Example:

SELECT
    mode() within group (order by value)
FROM
    (VALUES ('2019-07-12 00:00:00'::timestamptz, 1),('2019-07-12 00:01:00'::timestamptz, 2),('2019-07-12 00:02:00'::timestamptz, 3)) v(time,value);

value_of_percentile(column float, percentile float, compression float)

Returns percentile values for a specified column in ascending order. It result is an approximation of percentile_cont, but the function achieves better performance than percentile_cont.

**Table 7** Parameter description
Parameter	Type	Description	Required/Option
column	float	Column whose percentile is to be calculated	Required
percentile	float	Percentile value. Value range: 0 to 1	Required
compression	float	Specifies the compression coefficient. The value range is [0,500]. The default value is 300. A larger value indicates higher memory usage and higher result precision. If the specified value is not within the value range, the value is regarded as 300.	Required

Example:

SELECT value_of_percentile(values, 0.8, 0) from TABLE;

percentile_of_value(column float, percentilevalue float, compression float)

Returns percentiles in ascending order for a given column. This function is the inverse function of value_of_percentile.

**Table 8** Parameter description
Parameter	Type	Description	Requried/Option
column	float	Column whose percentile is to be calculated	Required
percentilevalue	float	Value whose percentile is to be calculated	Required
compression	float	Specifies the compression coefficient. The value range is [0,500]. The default value is 300. A larger value indicates higher memory usage and higher result precision. If the specified value is not within the value range, the value is regarded as 300.	Required

Example:

SELECT percentile_of_value(values, 80, 0) from TABLE;

first(column1, column2)

Aggregate Functions Compare the values of column2 in a group, find the minimum value, and output the value of column1.

**Table 9** Parameter description
Parameter	Type	Description	Requried/Option
column1	bigint/text/double/numeric	Output column	Required
column2	timestamp/timestamptz/numeric	Comparison column	Required

Example (the table definition and data in the time_fill expression is used):

Obtain the first idle value in time order in each group based on scope_name.

    
         select first(idle, time_string) from dcs_cpu group by scope_name;
first
-------
1
3
(2 rows)

last(column1, column2)

Aggregate Functions Compare the values of column2 in a group, find the maximum value, and output the corresponding value of column1.

**Table 10** Parameter description
Parameter	Type	Description	Requried/Option
column1	bigint/text/double/numeric	Output column	Required
column2	timestamp/timestamptz/numeric	Comparison column	Required

Example (the table definition and data in the time_fill expression is used):

Obtain the last idle value in time order in each group based on scope_name.

    
         select last(idle, time_string) from dcs_cpu group by scope_name;
last
------
2
3
(2 rows)

get_timeline_count_internal(schema_name text, rel_name text)

Obtains the number of rows in the tag table of the time series table on the current DN. This function can be used only on DNs.

Parameter	Type	Description	Required/Option
schema_name	text	Name of the schema that the time series table belongs to	Required
rel_name	text	Name of a time series table	Required

Example:

Create a table and insert data.

    
         CREATE TABLE IF NOT EXISTS CPU(
scope_name text TSTag,
server_ip text TSTag,
group_path text TSTag,
time timestamptz TSTime,
idle numeric TSField
) with (orientation=TIMESERIES) distribute by hash(scope_name);
insert into CPU values('dcxtataetaeta','10.145.255.33','saetataetaeta','2020-04-07 17:12:09+08',  60639);
insert into CPU values('wrhtataetaeta','10.145.255.33','saetataetaeta','2020-04-07 17:12:09+08',  53311);
insert into CPU values('saetataetaeta','10.145.255.33','saetataetaeta','2020-04-07 17:12:09+08',  27101);
insert into CPU values('saetataetaeta','10.145.255.33','saetataetaeta','2020-04-07 17:12:09+08',  48005);

After data is transferred from the delta table to CU, connect to the DN and execute the following function:

    
         select get_timeline_count_internal('public', 'cpu');
get_timeline_count_internal
-----------------------------
2
(1 row)

get_timeline_count(relname regclass)

Obtains the number of rows in the tag table of the time series table on each DN. This function can be used only on CNs.

Parameter	Type	Description	Required/Option
relname	regclass	Name of the time series table	Required

Example:

Table creation and data import are the same as those of the get_timeline_count_internal function. Connect to the CN and execute the function:

    
         select get_timeline_count('cpu');
get_timeline_count
--------------------
(dn_1,2)
(dn_2,1)
(2 rows)

gs_clean_tag_relation(tagOid oid)

This function is used to delete useless data in the row corresponding to a tagid in a tag table. Data in the primary table is cleared during automatic partition elimination. However, if a tag table is used for a long time, there may be some obsolete data in it You can invoke this function to clear the row data in the tag table to improve the utilization rate of the tag table. The returned value is the number of rows that are successfully deleted from the tag table.

Parameter description

Parameter	Type	Description	Required/Option
tagOid	oid	Delete useless data from a specified tag table.	Required

Example:

    
     
       
       
         CREATE TABLE IF NOT EXISTS CPU(
scope_name text TSTag,
server_ip text TSTag,
group_path text TSTag,
time timestamptz TSTime,
idle numeric TSField,
system numeric TSField,
util numeric TSField,
vcpu_num numeric TSField,
guest numeric TSField,
iowait numeric TSField,
users numeric TSField) with (orientation=TIMESERIES) distribute by hash(scope_name);
SELECT oid FROM PG_CLASS WHERE relname='cpu';
  oid
-------
 19099
(1 row)
SELECT gs_clean_tag_relation(19099);
 gs_clean_tag_relation
-----------------------
                     0
(1 row)

        

      

    
   

ts_table_part_policy_pgjob_to_pgtask(schemaName text, tableName text)

This function is used to migrate partition management tasks of a time series table. It is used only when the time series table is upgraded along with the cluster upgrade from 8.1.1 to 8.1.3. In version 8.1.1, the partition management tasks of time series tables are in the pg_jobs table, while in version 8.1.3, these tasks are in the pg_task table. Therefore, during the cluster upgrade from version 8.1.1 to version 8.1.3, the partition management tasks need to be migrated from pg_jobs to pg_task. This function migrates only the time series table partition management tasks. After the migration is complete, the status of the original tasks in pg_jobs table are changed to broken.

Parameter	Type	Description	Required/Option
schemaName	text	Name of the schema that the time series table belongs to	Required
tableName	text	Name of the time series table	Required

Example:

CALL ts_table_part_policy_pgjob_to_pgtask('public','cpu1');
WARNING:  The job on pg_jobs is migrated to pg_task, and the original job is broken, the job what is call proc_drop_partition('public.cpu1', interval '7 d'); , the job interval is interval '1 day'.
WARNING:  The job on pg_jobs is migrated to pg_task, and the original job is broken, the job what is call proc_add_partition('public.cpu1', interval '1 d'); , the job interval is interval '1 day'.
ts_table_part_policy_pgjob_to_pgtask

----------------------------------

(1 row)

proc_part_policy_pgjob_to_pgtask()

This function is used only when the time series table is upgraded along with the cluster upgrade from version 8.1.1 to version 8.1.3. It is used to migrate the partition management tasks of all time series tables in the 8.1.1 version database. This function traverses all time series tables in the database and checks whether the partition management tasks of a time series table are migrated. If the tasks are not migrated, the ts_table_part_policy_pgjob_to_pgtask function is invoked to migrate them. If the migration fails, the entire system is rolled back.

Example:

CALL proc_part_policy_pgjob_to_pgtask();
NOTICE:  find table, name is cpu1, namespace is public.
WARNING:  The job on pg_jobs is migrated to pg_task, and the original job is broken, the job what is call proc_drop_partition('public.cpu1', interval '7 d'); , the job interval is interval '1 day'.
CONTEXT:  SQL statement "call ts_table_part_policy_pgjob_to_pgtask('public', 'cpu1');"
PL/pgSQL function proc_part_policy_pgjob_to_pgtask() line 17 at EXECUTE statement
WARNING:  The job on pg_jobs is migrated to pg_task, and the original job is broken, the job what is call proc_add_partition('public.cpu1', interval '1 d'); , the job interval is interval '1 day'.
CONTEXT:  SQL statement "call ts_table_part_policy_pgjob_to_pgtask('public', 'cpu1');"
PL/pgSQL function proc_part_policy_pgjob_to_pgtask() line 17 at EXECUTE statement
NOTICE:  find table, name is cpu2, namespace is public.
WARNING:  The job on pg_jobs is migrated to pg_task, and the original job is broken, the job what is call proc_add_partition('public.cpu2', interval '1 d'); , the job interval is interval '1 day'.
CONTEXT:  SQL statement "call ts_table_part_policy_pgjob_to_pgtask('public', 'cpu2');"
PL/pgSQL function proc_part_policy_pgjob_to_pgtask() line 17 at EXECUTE statement
proc_part_policy_pgjob_to_pgtask
--------------------------------------

(1 row)

print_sql_part_policy_pgjob_to_pgtask()

This function is used only when the time series tables are upgraded along with the cluster upgrade from version 8.1.1 to version 8.1.3. This function is used to print SQL statements that can be used to migrate the partition management tasks of a time series table. The migration granularity of proc_part_policy_pgjob_to_pgtask function is at the database level. But you can manually execute the statements printed by the proc_part_policy_pgjob_to_pgtask function to implement the table-level migration granularity.

Example:

CALL print_sql_part_policy_pgjob_to_pgtask();
call ts_table_part_policy_pgjob_to_pgtask('public', 'cpu1');
call ts_table_part_policy_pgjob_to_pgtask('public', 'cpu2');
print_sql_part_policy_pgjob_to_pgtask
---------------------------------------

(1 row)

Previous topic: CREATE INDEX

Next topic: Stream Data Warehouse GUC Parameters