HLL Functions and Operators
Hash Functions
- hll_hash_boolean(bool, int32)
Description: Configures a hash seed (that is, change the hash policy) and hashes data of the bool type.
Return type: hll_hashval
Example:
1 2 3 4 5
openGauss=# SELECT hll_hash_boolean(FALSE, 10); hll_hash_boolean -------------------- -1169037589280886076 (1 row)
- hll_hash_smallint(smallint)
Description: Hashes data of the smallint type.
Return type: hll_hashval
Example:
1 2 3 4 5
openGauss=# SELECT hll_hash_smallint(100::smallint); hll_hash_smallint --------------------- 962727970174027904 (1 row)
If parameters with the same numeric value are hashed using different data types, the data will differ, because hash functions select different calculation policies for each type.
- hll_hash_smallint(smallint, int32)
Description: Configures a hash seed (that is, change the hash policy) and hashes data of the smallint type.
Return type: hll_hashval
Example:
1 2 3 4 5
openGauss=# SELECT hll_hash_smallint(100::smallint, 10); hll_hash_smallint --------------------- -9056177146160443041 (1 row)
- hll_hash_integer(integer)
Description: Hashes data of the integer type.
Return type: hll_hashval
Example:
1 2 3 4 5
openGauss=# SELECT hll_hash_integer(0); hll_hash_integer ---------------------- 5156626420896634997 (1 row)
- hll_hash_integer(integer, int32)
Description: Hashes data of the integer type and configures a hash seed (that is, change the hash policy).
Return type: hll_hashval
Example:
1 2 3 4 5
openGauss=# SELECT hll_hash_integer(0, 10); hll_hash_integer -------------------- -5035020264353794276 (1 row)
- hll_hash_bigint(bigint)
Description: Hashes data of the bigint type.
Return type: hll_hashval
Example:
1 2 3 4 5
openGauss=# SELECT hll_hash_bigint(100::bigint); hll_hash_bigint --------------------- -2401963681423227794 (1 row)
- hll_hash_bigint(bigint, int32)
Description: Hashes data of the bigint type and configures a hash seed (that is, change the hash policy).
Return type: hll_hashval
Example:
1 2 3 4 5
openGauss=# SELECT hll_hash_bigint(100::bigint, 10); hll_hash_bigint --------------------- -2305749404374433531 (1 row)
- hll_hash_bytea(bytea)
Description: Hashes data of the bytea type.
Return type: hll_hashval
Example:
1 2 3 4 5
openGauss=# SELECT hll_hash_bytea(E'\\x'); hll_hash_bytea ---------------- 0 (1 row)
- hll_hash_bytea(bytea, int32)
Description: Hashes data of the bytea type and configures a hash seed (that is, change the hash policy).
Return type: hll_hashval
Example:
1 2 3 4 5
openGauss=# SELECT hll_hash_bytea(E'\\x', 10); hll_hash_bytea --------------------- 7233188113542599437 (1 row)
- hll_hash_text(text)
Description: Hashes data of the text type.
Return type: hll_hashval
Example:
1 2 3 4 5
openGauss=# SELECT hll_hash_text('AB'); hll_hash_text --------------------- -5666002586880275174 (1 row)
- hll_hash_text(text, int32)
Description: Hashes data of the text type and configures a hash seed (that is, change the hash policy).
Return type: hll_hashval
Example:
1 2 3 4 5
openGauss=# SELECT hll_hash_text('AB', 10); hll_hash_text --------------------- -2215507121143724132 (1 row)
- hll_hash_any(anytype)
Description: Hashes data of any type.
Return type: hll_hashval
Example:
1 2 3 4 5 6 7 8 9 10 11
openGauss=# select hll_hash_any(1); hll_hash_any ---------------------- -1316670585935156930 (1 row) openGauss=# select hll_hash_any('08:00:2b:01:02:03'::macaddr); hll_hash_any ---------------------- -3719950434455589360 (1 row)
- hll_hash_any(anytype, int32)
Description: Hashes data of any type and configures a hash seed (that is, change the hash policy).
Return type: hll_hashval
Example:
1 2 3 4 5
openGauss=# select hll_hash_any(1, 10); hll_hash_any ---------------------- 7048553517657992351 (1 row)
- hll_hashval_eq(hll_hashval, hll_hashval)
Description: Compares two pieces of data of the hll_hashval type to check whether they are the same.
Return type: Boolean
Example:
1 2 3 4 5
openGauss=# select hll_hashval_eq(hll_hash_integer(1), hll_hash_integer(1)); hll_hashval_eq ---------------- t (1 row)
- hll_hashval_ne(hll_hashval, hll_hashval)
Description: Compares two pieces of data of the hll_hashval type to check whether they are different.
Return type: Boolean
Example:
1 2 3 4 5
openGauss=# select hll_hashval_ne(hll_hash_integer(1), hll_hash_integer(1)); hll_hashval_ne ---------------- f (1 row)
HLL Functions
There are three HLL modes: explicit, sparse, and full. When the data size is small, the explicit mode is used. In this mode, distinct values are calculated without errors. As the number of distinct values increases, the HLL mode is switched to the sparse and full modes in sequence. The two modes have no difference in the calculation result, but vary in the calculation efficiency of HLL functions and the storage space of HLL objects. The following functions can be used to view some HLL parameters:
- hll_print(hll)
Description: Prints some debugging parameters of an HLL.
Example:
1 2 3 4 5
openGauss=# select hll_print(hll_empty()); hll_print ------------------------------------------------------------------------------- type=1(HLL_EMPTY), log2m=14, log2explicit=10, log2sparse=12, duplicatecheck=0 (1 row)
- hll_type(hll)
Description: Checks the type of the current HLL. The return values are described as follows: 0 indicates HLL_UNINIT, an HLL object that is not initialized. 1 indicates HLL_EMPTY, an empty HLL object. 2 indicates HLL_EXPLICIT, an HLL object in explicit mode. 3 indicates HLL_SPARSE, an HLL object in sparse mode. 4 indicates HLL_FULL, an HLL object in full mode. 5 indicates HLL_UNDEFINED, an invalid HLL object.
Example:
1 2 3 4 5
openGauss=# select hll_type(hll_empty()); hll_type ---------- 1 (1 row)
- hll_log2m(hll)
Description: Checks the value of log2m in the current HLL data structure. log2m is the logarithm of the number of buckets. This value affects the error rate of calculating distinct values by HLL. The error rate = ±1.04/√(2^log2m). If the value of log2m ranges from 10 to 16, HLL sets the number of buckets to 2log2m. When the value of log2explicit is explicitly set to –1, the built-in default value is used.
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
openGauss=# select hll_log2m(hll_empty()); hll_log2m ----------- 14 (1 row) openGauss=# select hll_log2m(hll_empty(10)); hll_log2m ----------- 10 (1 row) openGauss=# select hll_log2m(hll_empty(-1)); hll_log2m ----------- 14 (1 row)
- hll_log2explicit(hll)
Description: Queries the value of log2explicit in the current HLL data structure. Generally, the HLL changes from the explicit mode to the sparse mode and then to the full mode. This process is called the promotion hierarchy policy. You can change the value of log2explicit to change the policy. For example, if the value of log2explicit is 0, the HLL will skip the explicit mode and directly enter the sparse mode. When the value of log2explicit is explicitly set to a value ranging from 1 to 12, the HLL will switch to the sparse mode when the length of the data segment exceeds 2log2explicit. When the value of log2explicit is explicitly set to –1, the built-in default value is used.
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
openGauss=# select hll_log2explicit(hll_empty()); hll_log2explicit ------------------ 10 (1 row) openGauss=# select hll_log2explicit(hll_empty(12, 8)); hll_log2explicit ------------------ 8 (1 row) openGauss=# select hll_log2explicit(hll_empty(12, -1)); hll_log2explicit ------------------ 10 (1 row)
- hll_log2sparse(hll)
Description: Queries the value of log2sparse in the current HLL data structure. Generally, the HLL changes from the explicit mode to the sparse mode and then to the full mode. This process is called the promotion hierarchy policy. You can adjust the value of log2sparse to change the policy. For example, if the value of log2sparse is 0, the system skips the sparse mode and directly enters the full mode. If the value of log2sparse is explicitly set to a value ranging from 1 to 14, the HLL will switch to the full mode when the length of the data segment exceeds 2log2sparse. When the value of log2sparse is explicitly set to –1, the built-in default value is used.
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
openGauss=# select hll_log2sparse(hll_empty()); hll_log2sparse ---------------- 12 (1 row) openGauss=# select hll_log2sparse(hll_empty(12, 8, 10)); hll_log2sparse ---------------- 10 (1 row) openGauss=# select hll_log2sparse(hll_empty(12, 8, -1)); hll_log2sparse ---------------- 12 (1 row)
- hll_duplicatecheck(hll)
Description: Specifies whether duplicate check is enabled. 0: disable; 1: enable. This function is disabled by default. If there are many duplicate values, you can enable this function to improve efficiency. When the value of duplicatecheck is explicitly set to –1, the built-in default value is used.
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
openGauss=# select hll_duplicatecheck(hll_empty()); hll_duplicatecheck -------------------- 0 (1 row) openGauss=# select hll_duplicatecheck(hll_empty(12, 8, 10, 1)); hll_duplicatecheck -------------------- 1 (1 row) openGauss=# select hll_duplicatecheck(hll_empty(12, 8, 10, -1)); hll_duplicatecheck -------------------- 0 (1 row)
Functional Functions
- hll_empty()
Description: Creates an empty HLL.
Return type: hll
Example:
1 2 3 4 5
openGauss=# select hll_empty(); hll_empty ------------------------------------------------------------ \x484c4c00000000002b05000000000000000000000000000000000000 (1 row)
- hll_empty(int32 log2m)
Description: Creates an empty HLL and sets the log2m parameter. The parameter value ranges from 10 to 16. If the input is –1, the built-in default value is used.
Return type: HLL
Example:
1 2 3 4 5 6 7 8 9 10 11
openGauss=# select hll_empty(10); hll_empty ------------------------------------------------------------ \x484c4c00000000002b04000000000000000000000000000000000000 (1 row) openGauss=# select hll_empty(-1); hll_empty ------------------------------------------------------------ \x484c4c00000000002b05000000000000000000000000000000000000 (1 row)
- hll_empty(int32 log2m, int32 log2explicit)
Description: Creates an empty HLL and sets the log2m and log2explicit parameters in sequence. The value of log2explicit ranges from 0 to 12. The value 0 indicates that the explicit mode is skipped. This parameter is used to set the threshold of the explicit mode. When the length of the data segment reaches 2log2explicit, the mode is switched to the sparse or full mode. If the input is –1, the built-in default value of log2explicit is used.
Return type: HLL
Example:
1 2 3 4 5 6 7 8 9 10 11
openGauss=# select hll_empty(10, 4); hll_empty ------------------------------------------------------------ \x484c4c00000000001304000000000000000000000000000000000000 (1 row) openGauss=# select hll_empty(10, -1); hll_empty ------------------------------------------------------------ \x484c4c00000000002b04000000000000000000000000000000000000 (1 row)
- hll_empty(int32 log2m, int32 log2explicit, int64 log2sparse)
Description: Creates an empty HLL and sets the log2m, log2explicit and log2sparse parameters in sequence. The value of log2sparse ranges from 0 to 14. The value 0 indicates that the sparse mode is skipped. This parameter is used to set the threshold of the sparse mode. When the length of the data segment reaches 2log2sparse, the mode is switched to the full mode. If the input is –1, the built-in default value of log2sparse is used.
Return type: HLL
Example:
1 2 3 4 5 6 7 8 9 10 11
openGauss=# select hll_empty(10, 4, 8); hll_empty ------------------------------------------------------------ \x484c4c00000000001204000000000000000000000000000000000000 (1 row) openGauss=# select hll_empty(10, 4, -1); hll_empty ------------------------------------------------------------ \x484c4c00000000001304000000000000000000000000000000000000 (1 row)
- hll_empty(int32 log2m, int32 log2explicit, int64 log2sparse, int32 duplicatecheck)
Description: Creates an empty HLL and sets the log2m, log2explicit, log2sparse, and duplicatecheck parameters in sequence. The value of duplicatecheck is 0 or 1, indicating whether the duplicate check mode is enabled. By default, this mode is disabled. If the input is –1, the built-in default value of duplicatecheck is used.
Return type: HLL
Example:
1 2 3 4 5 6 7 8 9 10 11
openGauss=# select hll_empty(10, 4, 8, 0); hll_empty ------------------------------------------------------------ \x484c4c00000000001204000000000000000000000000000000000000 (1 row) openGauss=# select hll_empty(10, 4, 8, -1); hll_empty ------------------------------------------------------------ \x484c4c00000000001204000000000000000000000000000000000000 (1 row)
- hll_add(hll, hll_hashval)
Description: Adds hll_hashval to an HLL.
Return type: HLL
Example:
1 2 3 4 5
openGauss=# select hll_add(hll_empty(), hll_hash_integer(1)); hll_add ---------------------------------------------------------------------------- \x484c4c08000002002b0900000000000000f03f3e2921ff133fbaed3e2921ff133fbaed00 (1 row)
- hll_add_rev(hll_hashval, hll)
Description: Adds hll_hashval to an HLL. This function works the same as hll_add, except that the positions of parameters are switched.
Return type: HLL
Example:
1 2 3 4 5
openGauss=# select hll_add_rev(hll_hash_integer(1), hll_empty()); hll_add_rev ---------------------------------------------------------------------------- \x484c4c08000002002b0900000000000000f03f3e2921ff133fbaed3e2921ff133fbaed00 (1 row)
- hll_eq(hll, hll)
Description: Compares two HLLs to check whether they are the same.
Return type: Boolean
Example:
1 2 3 4 5
openGauss=# select hll_eq(hll_add(hll_empty(), hll_hash_integer(1)), hll_add(hll_empty(), hll_hash_integer(2))); hll_eq -------- f (1 row)
- hll_ne(hll, hll)
Description: Compares two HLLs to check whether they are different.
Return type: Boolean
Example:
1 2 3 4 5
openGauss=# select hll_ne(hll_add(hll_empty(), hll_hash_integer(1)), hll_add(hll_empty(), hll_hash_integer(2))); hll_ne -------- t (1 row)
- hll_cardinality(hll)
Description: Calculates the number of distinct values of an HLL.
Return type: int
Example:
1 2 3 4 5
openGauss=# select hll_cardinality(hll_empty() || hll_hash_integer(1)); hll_cardinality ----------------- 1 (1 row)
- hll_union(hll, hll)
Description: Performs the UNION operation on two HLL data structures to obtain one HLL.
Return type: HLL
Example:
1 2 3 4 5
openGauss=# select hll_union(hll_add(hll_empty(), hll_hash_integer(1)), hll_add(hll_empty(), hll_hash_integer(2))); hll_union -------------------------------------------------------------------------------------------- \x484c4c10002000002b090000000000000000400000000000000000b3ccc49320cca1ae3e2921ff133fbaed00 (1 row)
Aggregate Functions
- hll_add_agg(hll_hashval)
Description: Groups hashed data into HLL.
Return type: HLL
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
-- Prepare data: openGauss=# create table t_id(id int); openGauss=# insert into t_id values(generate_series(1,500)); openGauss=# create table t_data(a int, c text); openGauss=# insert into t_data select mod(id,2), id from t_id; -- Create a table and specify an HLL column: openGauss=# create table t_a_c_hll(a int, c hll); -- Use GROUP BY on column a to group data, and insert the data to the HLL: openGauss=# insert into t_a_c_hll select a, hll_add_agg(hll_hash_text(c)) from t_data group by a; -- Calculate the number of distinct values for each group in the HLL: openGauss=# select a, #c as cardinality from t_a_c_hll order by a; a | cardinality ---+------------------ 0 | 247.862354346299 1 | 250.908710610377 (2 rows)
- hll_add_agg(hll_hashval, int32 log2m)
Description: Groups hashed data into HLL and specifies the log2m parameter. The value ranges from 10 to 16. If the input is –1 or NULL, the built-in default value is used.
Return type: HLL
Example:
1 2 3 4 5
openGauss=# select hll_cardinality(hll_add_agg(hll_hash_text(c), 12)) from t_data; hll_cardinality ------------------ 497.965240179228 (1 row)
- hll_add_agg(hll_hashval, int32 log2m, int32 log2explicit)
Description: Groups hashed data into HLL and specifies the log2m and log2explicit parameters in sequence. The value of log2explicit ranges from 0 to 12. The value 0 indicates that the explicit mode is skipped. This parameter is used to set the threshold of the explicit mode. When the length of the data segment reaches 2log2explicit, the mode is switched to the sparse or full mode. If the input is –1 or NULL, the built-in default value of log2explicit is used.
Return type: HLL
Example:
1 2 3 4 5
openGauss=# select hll_cardinality(hll_add_agg(hll_hash_text(c), NULL, 1)) from t_data; hll_cardinality ------------------ 498.496062953313 (1 row)
- hll_add_agg(hll_hashval, int32 log2m, int32 log2explicit, int64 log2sparse)
Description: Groups hashed data into HLL and sets the log2m, log2explicit, and log2sparse parameters in sequence. The value of log2sparse ranges from 0 to 14. The value 0 indicates that the sparse mode is skipped. This parameter is used to set the threshold of the sparse mode. When the length of the data segment reaches 2log2sparse, the mode is switched to the full mode. If the input is –1 or NULL, the built-in default value of log2sparse is used.
Return type: HLL
Example:
1 2 3 4 5
openGauss=# select hll_cardinality(hll_add_agg(hll_hash_text(c), NULL, 6, 10)) from t_data; hll_cardinality ------------------ 498.496062953313 (1 row)
- hll_add_agg(hll_hashval, int32 log2m, int32 log2explicit, int64 log2sparse, int32 duplicatecheck)
Description: Groups hashed data into HLL and sets the log2m, log2explicit, log2sparse, and duplicatecheck parameters. The value of duplicatecheck can be 0 or 1, indicating whether to enable this mode. By default, this mode is disabled. If the input is –1 or NULL, the built-in default value of duplicatecheck is used.
Return type: HLL
Example:
1 2 3 4 5
openGauss=# select hll_cardinality(hll_add_agg(hll_hash_text(c), NULL, 6, 10, -1)) from t_data; hll_cardinality ------------------ 498.496062953313 (1 row)
- hll_union_agg(hll)
Description: Performs the UNION operation on multiple pieces of data of the HLL type to obtain one HLL.
Return type: HLL
Example:
1 2 3 4 5 6
-- Perform the UNION operation on data of the HLL type in each group to obtain one HLL, and calculate the number of distinct values: openGauss=# select #hll_union_agg(c) as cardinality from t_a_c_hll; cardinality ------------------ 498.496062953313 (1 row)
To perform the UNION operation on data in multiple HLLs, ensure that the HLLs have the same precision. Otherwise, UNION cannot be performed. This constraint also applies to the hll_union(hll, hll) function.
Obsolete Functions
Some old HLL functions are discarded due to version upgrade. You can replace them with similar functions.
- hll_schema_version(hll)
Description: Checks the schema version in the current HLL. In earlier versions, the schema version is fixed at 1, which is used to verify the header of the HLL field. After refactoring, the HLL field is added to the header for verification. The schema version is no longer used.
- hll_regwidth(hll)
Description: Queries the bucket size in the HLL data structure. In earlier versions, the value of regwidth ranges from 1 to 5, which has a large error and limits the upper limit of the cardinality estimation. After refactoring, the value of regwidth is fixed at 6 and the variable is not used.
- hll_expthresh(hll)
Description: Obtains the value of expthresh in the current HLL. The hll_log2explicit(hll) function is used to replace similar functions.
- hll_sparseon(hll)
Description: Specifies whether the sparse mode is enabled. Use hll_log2sparse(hll) to replace similar functions. The value 0 indicates that the sparse mode is disabled.
Built-In Functions
HLL has a series of built-in functions for internal data processing. Generally, users do not need to know how to use these functions. For details, see Table 1.
Function |
Description |
---|---|
hll_in |
Receives hll data in string format. |
hll_out |
Sends hll data in string format. |
hll_recv |
Receives hll data in bytea format. |
hll_send |
Sends hll data in bytea format. |
hll_trans_in |
Receives hll_trans_type data in string format. |
hll_trans_out |
Sends hll_trans_type data in string format. |
hll_trans_recv |
Receives hll_trans_type data in bytea format. |
hll_trans_send |
Sends hll_trans_type data in bytea format. |
hll_typmod_in |
Receives typmod data. |
hll_typmod_out |
Sends typmod data. |
hll_hashval_in |
Receives hll_hashval data. |
hll_hashval_out |
Sends hll_hashval data. |
hll_add_trans0 |
Works similar to hll_add. No input parameter is specified during initialization. It is usually used in the first phase of DNs in distributed aggregation operations. |
hll_add_trans1 |
Works similar to hll_add. An input parameter is specified during initialization. It is usually used in the first phase of DNs in distributed aggregation operations. |
hll_add_trans2 |
Works similar to hll_add. Two input parameters are specified during initialization. It is usually used in the first phase of DNs in distributed aggregation operations. |
hll_add_trans3 |
Works similar to hll_add. Three input parameters are specified during initialization. It is usually used in the first phase of DNs in distributed aggregation operations. |
hll_add_trans4 |
Works similar to hll_add. Four input parameters are specified during initialization. It is usually used in the first phase of DNs in distributed aggregation operations. |
hll_union_trans |
Works similar to hll_union and is used on the first phase of DNs in distributed aggregation operations. |
hll_union_collect |
Works similar to hll_union and is used on the second phase of CNs in distributed aggregation operations to summarize the results of each DN. |
hll_pack |
Is used on the third phase of CNs in distributed aggregation operations to convert a user-defined type hll_trans_type to the hll type. |
hll |
Converts a hll type to another hll type. Input parameters can be specified. |
hll_hashval |
Converts the bigint type to the hll_hashval type. |
hll_hashval_int4 |
Converts the int4 type to the hll_hashval type. |
Operators
- =
Description: Compares the values of the hll or hll_hashval type to check whether they are the same.
Return type: Boolean
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13
--hll openGauss=# select (hll_empty() || hll_hash_integer(1)) = (hll_empty() || hll_hash_integer(1)); column ---------- t (1 row) --hll_hashval openGauss=# select hll_hash_integer(1) = hll_hash_integer(1); ?column? ---------- t (1 row)
- <> or !=
Description: Compares the values of the hll or hll_hashval type to check whether they are different.
Return type: Boolean
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13
--hll openGauss=# select (hll_empty() || hll_hash_integer(1)) <> (hll_empty() || hll_hash_integer(2)); ?column? ---------- t (1 row) --hll_hashval openGauss=# select hll_hash_integer(1) <> hll_hash_integer(2); ?column? ---------- t (1 row)
- ||
Description: Represents the functions of hll_add, hll_union, and hll_add_rev.
Return type: HLL
Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
--hll_add openGauss=# select hll_empty() || hll_hash_integer(1); ?column? ---------------------------------------------------------------------------- \x484c4c08000002002b0900000000000000f03f3e2921ff133fbaed3e2921ff133fbaed00 (1 row) --hll_add_rev openGauss=# select hll_hash_integer(1) || hll_empty(); ?column? ---------------------------------------------------------------------------- \x484c4c08000002002b0900000000000000f03f3e2921ff133fbaed3e2921ff133fbaed00 (1 row) --hll_union openGauss=# select (hll_empty() || hll_hash_integer(1)) || (hll_empty() || hll_hash_integer(2)); ?column? -------------------------------------------------------------------------------------------- \x484c4c10002000002b090000000000000000400000000000000000b3ccc49320cca1ae3e2921ff133fbaed00 (1 row)
- #
Description: Calculates the number of distinct values of an HLL. It works the same as the hll_cardinality function.
Return type: int
Example:
1 2 3 4 5
openGauss=# select #(hll_empty() || hll_hash_integer(1)); ?column? ---------- 1 (1 row)
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot