El contenido no se encuentra disponible en el idioma seleccionado. Estamos trabajando continuamente para agregar más idiomas. Gracias por su apoyo.

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Aggregate Functions

Updated on 2024-12-13 GMT+08:00

The aggregate function operates on a set of values to obtain a single value.

Except count(), count_if(), max_by(), min_by(), and approx_distinct(), other aggregate functions ignore null values and return null values when there is no input row or all values are null. For example, sum() returns null instead of 0, and avg() does not take null values during statistics collection. The coalesce function converts null to 0.

Clause of an Aggregate Function

  • Order by

    Some aggregate functions may generate different results due to different sequences of input values. You can use the order by clause in the aggregate function to specify the sequence.

    array_agg(x ORDER BY y DESC);
    array_agg(x ORDER BYx,y,z);
  • Filter

    You can use the filter keyword to filter out unnecessary rows by using the where condition expression during aggregation. All aggregate functions support this function.

    aggregate_function(...) FILTER (WHERE <condition>)

Example:

--Create a table.
create table fruit (name varchar, price int);
--Insert data.
insert into fruit values ('peach',5),('apple',2);
--Sorting:
select array_agg (name order by price) from fruit;-- [apple, peach]
--Filtering:
select array_agg(name) filter (where price<10) from fruit;-- [peach, apple] 

Common Aggregate Functions

An aggregate function usually applies to a specific field in a data set (table or view). The following parameter x is used to refer to the field.
  • arbitrary(x)

    Description: Returns a non-null value of X. The return type is the same as X.

    select arbitrary(price) from fruit;-- 5
  • array_agg(x)

    Description: Returns an array of input x fields of the same type as the input field.

    select array_agg(price) from fruit;-- [5,2]
  • avg(x)

    Description: Returns the average of all input values in double type.

    select avg(price) from fruit;-- 3.5
  • avg(time interval type)

    Description: Returns the average length of all input intervals. The return type is interval.

    select avg(last_login) from (values ('admin',interval '0 06:15:30' day to second),('user1',interval '0 07:15:30' day to second),('user2',interval '0 08:15:30' day to second)) as login_log(user,last_login);
    -- 0 07:15:30.000  Assume that a log table records the time since the last login. The result indicates that the average login interval is 0 days, 7 hours, 15 minutes, and 30 seconds.
  • bool_and(boolean value)

    Description: Returns true if each input value is true, otherwise returns false.

    select bool_and(isfruit) from (values ('item1',true), ('item2',false),('item3',true)) as items(item,isfruit);--false
    select bool_and(isfruit) from (values ('item1',true), ('item2',true),('item3',true)) as items(item,isfruit);-- true
  • bool_or(boolean value)

    Description: Returns true if any of the input values is true, otherwise returns false.

    select bool_or(isfruit) from (values ('item1',false), ('item2',false),('item3',false)) as items(item,isfruit);-- false
    select bool_or(isfruit) from (values ('item1',true), ('item2',false),('item3',false)) as items(item,isfruit); --true
  • checksum(x)

    Description: Returns the checksum of the input value, which is not affected by the input sequence. The result type is varbinary.

    select checksum(price) from fruit; -- fb 28 f3 9a 9a fb bf 86
  • count(*)

    Description: Returns the number of input records. The result type is bigint.

    select count(*) from fruit; -- 2
  • count(x)

    Description: Returns the number of records whose input field is not null. The result type is bigint.

    select count(name) from fruit;-- 2
  • count_if(x)

    Description: Returns the number of records whose input value is true. This function is similar to count(CASE WHEN x THEN 1 END) and is of the bigint type.

    select count_if(price>7) from fruit;-- 0
  • every(boolean)

    Description: Is an alias for bool_and().

  • geometric_mean(x)

    Description: Returns the geometric mean of the input field value. The value is of the double type.

    select geometric_mean(price) from fruit; -- 3.162277660168379
  • listagg(x, separator) → varchar

    Description: Returns a string concatenated by input values separated by specified separators.

    Syntax:

    LISTAGG( expression [, separator] [ON OVERFLOW overflow_behaviour])    WITHIN GROUP (ORDER BY sort_item, [...])

    If separator is not specified, the null character is used as the separator by default.

    SELECT listagg(value, ',') WITHIN GROUP (ORDER BY value) csv_value FROM (VALUES 'a', 'c', 'b') t(value);
    csv_value
    -----------
    'a,b,c'

    When the return value of this function exceeds 1048576 bytes, you can use overflow_behaviour to specify the action to take in this case. By default, an error will be thrown.

    SELECT listagg(value, ',' ON OVERFLOW ERROR) WITHIN GROUP (ORDER BY value) csv_value FROM (VALUES 'a', 'b', 'c') t(value);

    If the return value exceeds 1048576 bytes, truncate the extra non-null string and replace it with the string specified by TRUNCATE. WITH COUNT and WITHOUT COUNT indicate whether the return value contains the count.

    SELECT LISTAGG(value, ',' ON OVERFLOW TRUNCATE '.....' WITH COUNT) WITHIN GROUP (ORDER BY value)FROM (VALUES 'a', 'b', 'c') t(value);

    The listagg function can also be used for grouping strings. The following is an example:

    SELECT id, LISTAGG(value, ',') WITHIN GROUP (ORDER BY o) csv_value FROM (VALUES     
         (100, 1, 'a'),
         (200, 3, 'c'), 
         (200, 2, 'b') ) t(id, o, value)
     GROUP BY id
     ORDER BY id;
     id   | csv_value
     -----+------------- 
     100  | a 
     200  | b,c
  • max_by(x, y)

    Description: Returns the value of x associated with the maximum value of the y field in all input values.

    select max_by(name,price) from fruit; -- peach    
  • max_by(x, y, n)

    Description: Returns n x values sorted by y in descending order.

    select max_by(name,price,2) from fruit;-- [peach, apple]
  • min_by(x,y)

    Description: Returns the value of x associated with the minimum value of the y field in all input values.

    select min_by(name,price) from fruit;-- apple
  • min_by(x, y, n)

    Description: Returns n x values sorted by y in ascending order.

    select min_by(name,price,2) from fruit;-- [apple, peach] 
  • max(x)

    Description: Returns the maximum value of the input field x.

    select max(price) from fruit;-- 5
  • max(x, n)

    Description: Returns the first n values of the input field x in descending order.

    select max(price,2) from fruit; -- [5, 2]
  • min(x)

    Description: Returns the minimum value of the input field x.

    select min(price) from fruit;-- 2
  • min(x, n)

    Description: Returns the first n values of the input field x in ascending order.

    select min(price,2) from fruit;-- [2, 5]
  • sum(T, x)

    Description: Sums up the input field x. T is of the numeric type, for example, int, double, or interval day to second.

    select sum(price) from fruit;-- 7
  • regr_avgx(T independent, T dependent) → double

    Description: Calculates the average value of the independent variable (expr2) of the regression line. After the empty pair (expr1, expr2) is removed, the value is AVG(expr2).

    create table sample_collection(id int,time_cost int,weight decimal(5,2));
     
    insert into sample_collection values 
    (1,5,86.38),
    (2,10,281.17),
    (3,15,89.91),
    (4,20,17.5),
    (5,25,88.76),
    (6,30,83.94),
    (7,35,44.26),
    (8,40,17.4),
    (9,45,5.6),
    (10,50,145.68);
    
    select regr_avgx(time_cost,weight) from sample_collection;
           _col0       
    -------------------
     86.06000000000002 
    (1 row)
  • regr_avgy(T independent, T dependent) → double

    Description: Calculates the average value of the dependent variable (expr1) of the regression line. After the empty pair (expr1, expr2) is removed, the value is AVG(expr1).

    select regr_avgy(time_cost,weight) from sample_collection;
     _col0 
    -------
      27.5 
    (1 row)
  • regr_count(T independent, T dependent) → double

    Description: Returns the non-null logarithm used to fit a linear regression line.

    select regr_count(time_cost,weight) from sample_collection;
     _col0 
    -------
        10 
    (1 row)
  • regr_r2(T independent, T dependent) → double

    Description: Returns the coefficient of determination for regression.

    select regr_r2(time_cost,weight) from sample_collection;
           _col0        
    --------------------
     0.1446739237728169 
    (1 row)
  • regr_sxx(T independent, T dependent) → double

    Description: Returns the value of REGR_COUNT(expr1, expr2) * VAR_POP(expr2).

    select regr_sxx(time_cost,weight) from sample_collection;
           _col0        
    --------------------
     59284.886600000005 
    (1 row)
  • regr_sxy(T independent, T dependent) → double

    Description: Returns the value of REGR_COUNT(expr1, expr2) * COVAR_POP(expr1, expr2).

    select regr_sxy(time_cost,weight) from sample_collection;
      _col0   
    ----------
     -4205.95 
    (1 row)
  • regr_syy(T independent, T dependent) → double

    Description: Returns the value of REGR_COUNT(expr1, expr2) * VAR_POP(expr1).

    select regr_syy(time_cost,weight) from sample_collection;
     _col0  
    --------
     2062.5 
    (1 row)

Bitwise Aggregate Function

  • bitwise_and_agg(x)

    Description: Uses two's complement to represent the bitwise AND operation on the input field x. The return type is bigint.

    select bitwise_and_agg(x) from (values (31),(32)) as t(x);-- 0
  • bitwise_or_agg(x)

    Description: Uses two's complement to represent the bitwise OR operation on the input field x. The return type is bigint.

    select bitwise_or_agg(x) from (values (31),(32)) as t(x);-- 63

Map Aggregate Function

  • histogram(x) -> map(K, bigint)

    Description: Returns a map containing the number of occurrences of all input field x.

    select histogram(x),histogram(y) from (values (15,17),(15,18),(15,19),(15,20)) as t(x,y);-- {15=4},{17=1, 18=1, 19=1, 20=1}
  • map_agg(key, value) -> map(K, V)

    Description: Returns a map whose input field key and input field value are key-value pairs.

    select map_agg(name,price) from fruit;-- {apple=2, peach=5}
  • map_union(x(K, V)) -> map(K, V)

    Description: Returns the union of all input maps. If a key appears multiple times in the input set, the corresponding value is any value corresponding to the key in the input set.

    select map_union(x) from (values (map(array['banana'],array[10.0])),(map(array['apple'],array[7.0]))) as t(x);-- {banana=10.0, apple=7.0}
    select map_union(x) from (values (map(array['banana'],array[10.0])),(map(array['banana'],array[7.0]))) as t(x);-- {banana=10.0}
  • multimap_agg(key, value) -> map(K, array(V))

    Description: Returns a map consisting of input key-value pairs. Each key can correspond to multiple values.

    select multimap_agg(key, value) from (values ('apple',7),('apple',8),('apple',8),('lemon',5) ) as t(key,value); - {apple=[7, 8, 8], lemon=[5]} 

Approximation Aggregate Function

In actual situations, when we collect statistics on a large amount of data, sometimes we only care about an approximate value instead of a specific value. For example, when we collect statistics on the sales volume of a product, the approximation aggregate function is useful because it uses less memory and CPU resources, in this way, you can obtain data results without any problems, such as overflowing to disks or CPU peaks. This is useful for billions of rows of data calculations.
  • approx_median(x) → bigint

    Description: Calculates the median value of a distribution of values.

    select approx_median(price) from fruit; -- 10.0
  • approx_distinct(x) → bigint

    Description: The return type of this function is bigint, which provides an approximate count of count(distinct x). If all inputs are null, 0 is returned.

    The errors of all possible values of this function relative to the correct values follow an approximate normal distribution with a standard deviation of 2.3%. It does not guarantee the upper limit of any specific input set error.

    select approx_distinct(price) from fruit; -- 2
  • approx_distinct(x, e)→ bigint

    Description: The return type of this function is bigint, which provides an approximate count of count(distinct x). If all inputs are null, 0 is returned.

    The errors of all possible values of this function relative to the correct values follow an approximate normal distribution with a standard deviation of less than e. It does not guarantee the upper limit of any specific input set error.

    In the current implementation of the function, the value range of e is [0.0040625, 0.26000].

    select approx_distinct(weight,0.0040625) from sample_collection; -- 10 
    select approx_distinct(weight,0.26) from sample_collection; -- 8
  • approx_most_frequent(buckets, value, capacity) → map<[same as value], bigint>

    Description: Approximately collects statistics on the top buckets elements that appear most frequently. Less memory will be used for approximate estimation of high-frequency values. A larger capacity value indicates a more accurate result, which consumes more memory. The return value of this function is a map that consists of key-value pairs of high-frequency values and their frequencies.

    SELECT approx_most_frequent(3, x, 15) FROM (values 'A', 'B', 'A', 'C', 'A', 'B', 'C', 'D', 'E') t(x); --  {A=3, B=2, C=2}
    SELECT approx_most_frequent(3, x, 100) FROM (values 1, 2, 1, 3, 1, 2, 3, 4, 5) t(x); -- {1=3, 2=2, 3=2}

    NOTE:

    The commonly used quantiles are binary, quaternary, decimal, and percentile. This means that the input set is evenly divided into equal parts, and then the value at the corresponding position is found. For example, approx_percentile(x, 0.5) is used to find a value that is about 50% of the x value, that is, a 2-quantile value.

  • approx_percentile(x, percentage)→[same as x]

    Description: Returns the approximate percentile based on the given percentage. The percentage value must be a constant between 0 and 1 for all input rows.

    select approx_percentile(x, 0.5) from ( values (2),(3),(7),(8),(9)) as t(x); --7
  • approx_percentile(x, percentages)→ array<[same as x]>

    Description: Returns an approximate percentile of the x value of all input fields in a given percentage array. Each value in this percentage array must be a constant between 0 and 1 for all input rows.

    select approx_percentile(x, array[0.1,0.2,0.3,0.5]) from ( values (2),(3),(7),(8),(9)) as t(x); --[2, 3, 3, 7]
  • approx_percentile(x, w, percentage)→array<[same as x]>

    Description: Returns the approximate percentile of all x input values by percentage. The weight of each item is w and must be a positive number. Set a valid percentile for x. The value of percentage must range from 0 to 1, and all input rows must be constants.

    select approx_percentile(x, 5,array[0.1,0.2,0.3,0.5]) from ( values (2),(3),(7),(8),(9)) as t(x); --[2, 3, 3, 7]
  • approx_percentile(x, w, percentage, accuracy) →[same as x]

    Description: Returns the approximate percentile of all x input values by percentage. The weight of each item is w and must be a positive number. Set a valid percentile for x. The value of percentage must range from 0 to 1, and all input rows must be constants. The maximum progress error of the approximate value is specified by accuracy.

    select approx_percentile(x, 5,0.5,0.97) from ( values (2),(3),(7),(8),(9)) as t(x); --7
  • approx_percentile(x, w, percentages)→[same as x]

    Description: Returns an approximate percentile of all x input values based on each percentage value in the percentage array. The weight of each item is w and must be a positive number. Set a valid percentile for x. Each element value in the percentage array must range from 0 to 1, and all input rows must be constants.

    select approx_percentile(x,5, array[0.1,0.2,0.3,0.5]) from ( values (2),(3),(7),(8),(9)) as t(x);  -- [2, 3, 3, 7]
    NOTE:

    The preceding approx_percentile function also supports the percentile_approx function of the same parameter set.

  • numeric_histogram(buckets, value, weight)

    Description: Calculates the approximate histogram for all values based on the number of buckets. The width of each item uses weight. This algorithm is based on:

    Yael Ben-Haim and Elad Tom-Tov, "A streaming parallel decision tree algorithm", J. Machine Learning Research 11 (2010), pp. 849--872.

    The value of buckets must be bigint. The values of value and weight must be numbers.

    select numeric_histogram(20,x,4) from ( values (2),(3),(7),(8),(9)) as t(x);
    _col0                     
    -----------------------------------------------
     {2.0=4.0, 3.0=4.0, 7.0=4.0, 8.0=4.0, 9.0=4.0} 
    (1 row)
  • numeric_histogram(buckets, value)

    Description: Compared with numeric_histogram(buckets, value,weight), this function sets weight to 1.

    select numeric_histogram(20,x) from ( values (2),(3),(7),(8),(9)) as t(x);
    _col0                     
    -----------------------------------------------
     {2.0=1.0, 3.0=1.0, 7.0=1.0, 8.0=1.0, 9.0=1.0} 
    (1 row)

Statistical Aggregate Function

  • corr(y,x)

    Description: Returns the correlation coefficient of the input value.

    select corr(y,x) from (values (1,5),(2,6),(3,7),(4,8)) as t(x,y);-- 1.0
  • covar_pop(y, x)

    Description: Returns the population covariance of the input value.

    select covar_pop(y,x) from (values (1,5),(2,6),(3,7),(4,8)) as t(x,y); --1.25
  • covar_samp(y, x)

    Description: Returns the sample covariance of the input value.

    select covar_samp(y,x) from (values (1,5),(2,6),(3,7),(4,8)) as t(x,y);-- 1.6666666 
  • kurtosis(x)

    Description: Kurtosis, also called peak-state coefficient, indicates the number of peak values at the average value of the probability density distribution curve, that is, the statistical value that describes the steepness of all value distribution forms in the entire system. Intuitively, kurtosis reflects the sharpness of the peak. This statistic needs to be compared with the normal distribution.

    The kurtosis is defined as the 4th standardized central moment of the sample.

    The kurtosis of a random variable is the ratio of the fourth-order central moment of the random variable to the square of the variance.

    The calculation formula is as follows:

    select kurtosis(x) from (values (1),(2),(3),(4)) as t(x); -- -1.1999999999999993
  • regr_intercept(y, x)

    Description: Returns the linear regression intercept of the input value. y is a subordinate value. x is an independent value.

    select regr_intercept(y,x) from (values (1,5),(2,6),(3,7),(4,8)) as t(x,y);-- 4.0
  • regr_slope(y, x)

    Description: Returns the linear regression slope of the input value. y is a subordinate value. x is an independent value.

    select regr_slope(y,x) from (values (1,5),(2,6),(3,7),(4,8)) as t(x,y);--  1.0
  • skewness(x)

    Description: Returns the skew degree of all input values.

    select skewness(x) from (values (1),(2),(3),(4)) as t(x); -- 0.0
  • stddev(x)

    Description: Alias of stedev_samp()

  • stddev_pop(x)

    Description: Returns the population standard deviation of all input values.

    select stddev_pop(x) from (values (1),(2),(3),(4)) as t(x);--  1.118033988749895
  • stddev_samp(x)

    Description: Returns the sample standard deviation of all input values.

    select stddev_samp(x) from (values (1),(2),(3),(4)) as t(x);--  1.2909944487358056
  • variance(x)

    Description: Alias of var_samp()

  • var_pop(x)

    Description: Returns the population variance of all input values.

    select var_pop(x) from (values (1),(2),(3),(4)) as t(x);-- 1.25
  • var_samp(x)

    Description: Returns the sample variance of all input values.

    select var_samp(x) from (values (1),(2),(3),(4)) as t(x);--  1.6666666666666667

Lambda Aggregation Function

reduce_agg(inputValue T, initialState S, inputFunction(S, T, S), combineFunction(S, S, S))

inputFunction is called for each non-null input value. In addition to obtaining the input value, inputFunction also obtains the current status, which is initialState initially, and then returns the new status. CombineFunction is invoked to combine the two states into a new state. Return the final status:

SELECT id, reduce_agg(value, 0, (a, b) -> a + b, (a, b) -> a + b)
FROM (
    VALUES
        (1, 3),
        (1, 4),
        (1, 5),
        (2, 6),
        (2, 7)
) AS t(id, value)
GROUP BY id;
-- (1, 12)
-- (2, 13)

SELECT id, reduce_agg(value, 1, (a, b) -> a * b, (a, b) -> a * b)
FROM (
    VALUES
        (1, 3),
        (1, 4),
        (1, 5),
        (2, 6),
        (2, 7)
) AS t(id, value)
GROUP BY id;
-- (1, 60)
-- (2, 42)
NOTE:

The status value must be Boolean, integer, floating-point, date, time, or interval.

Utilizamos cookies para mejorar nuestro sitio y tu experiencia. Al continuar navegando en nuestro sitio, tú aceptas nuestra política de cookies. Descubre más

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback