HetuEngine Data Type
Currently, the following data types are supported during table creation: TINYINT, SMALLINT, BIGINT, INT, BOOLEAN, REAL, DECIMAL, DOUBLE, VARCHAR, STRING, BINARY, VARBINARY, TIMESTAMP, DATE, CHAR, ARRAY, ROW, MAP, and STRUCT. Other types are supported during data query and calculation.
Generally, most non-composite data types can be entered by literals and character strings. In the example, a string in the JSON format is added.
select json '{"name": "aa", "sex": "man"}'; _col0 --------------------------------- {"name":"aa","sex":"man"} (1 row)
Boolean
The valid text values for "true" are TRUE, t, true, and 1.
The valid text values for the "false" value are FALSE, f, false, and 0.
TRUE and FALSE are standard usages (SQL-compatible).
Example:
select BOOLEAN '0'; _col0 ------- false (1 row) select BOOLEAN 'TRUE'; _col0 ------- true (1 row) select BOOLEAN 't'; _col0 ------- true (1 row)
Integer
Name |
Description |
Storage Space |
Value Range |
Literal |
---|---|---|---|---|
TINYINT |
Tiny integer |
8 bits |
-128~127 |
TINYINT |
SMALLINT |
Small integer |
16 bits |
-32,768 to +32,767 |
SMALLINT |
INTEGER |
Integer |
32 bits |
-2,147,483,648 to +2,147,483,647 |
INT |
BIGINT |
Big integer |
64 bits |
-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 |
BIGINT |
Example:
--Create a table containing TINYINT data: CREATE TABLE int_type_t1 (IT_COL1 TINYINT) ; --Insert data of the TINYINT type: insert into int_type_t1 values (TINYINT'10'); --View data: SELECT * FROM int_type_t1; it_col1 --------- 10 (1 row) --Drop a table: DROP TABLE int_type_t1;
Fixed Precision
Name |
Description |
Storage Space |
Value Range |
Literal |
---|---|---|---|---|
DECIMAL |
Decimal number with fixed precision. The maximum precision is 38 bits, but the precision with less than 18 bits can ensure the best performance. Decimal has two input parameters:
|
64 bits |
|
DECIMAL |
NUMERIC |
Same as DECIMAL |
128 characters |
|
NUMERIC |
Literal |
Data Type |
---|---|
DECIMAL '0' |
DECIMAL(1) |
DECIMAL '12345' |
DECIMAL(5) |
DECIMAL '0000012345.1234500000' |
DECIMAL(20, 10) |
--Create a table containing DECIMAL data. CREATE TABLE decimal_t1 (dec_col1 DECIMAL(10,3)) ; – Insert data of the DECIMAL type. insert into decimal_t1 values (DECIMAL '5.325'); --View data. SELECT * FROM decimal_t1; dec_col1 --------- 5.325 (1 row) -- Negative example: The number of decimal places exceeds the defined length. As a result, the SQL statement fails to be executed. insert into decimal_t1 values (DECIMAL '5.3253'); Query 20201126_034601_00053_tq98i@default@HetuEngine failed: Insert query has mismatched column types: Table: [decimal(10,3)], Query: [decimal(5,4)] --Drop a table. DROP TABLE decimal_t1; --Create a NUMERIC type table. CREATE TABLE tb_numberic_hetu(col1 NUMERIC(9,7)); CREATE TABLE --Insert data. INSERT INTO tb_numberic_hetu values(9.12); INSERT: 1 row --View data. SELECT * FROM tb_numberic_hetu; col1 ------------ 9.1200000 (1 row)
Float
Name |
Description |
Storage Space |
Value Range |
Literal |
---|---|---|---|---|
REAL |
Real number |
32 bits |
1.40129846432481707e-45 to 3.40282346638528860e+38, positive or negative |
REAL |
DOUBLE |
Double-precision floating point number with 15 to 17 valid digits, depending on the application scenario. The number of valid digits does not depend on the decimal point. |
64 bits |
4.94065645841246544e-324 to 1.79769313486231570e+308, positive or negative |
DOUBLE |
FLOAT |
Single-precision floating point number with 6 to 9 valid digits, depending on the application scenario. The number of valid digits does not depend on the decimal point. |
32 bits |
1.40129846432481707e-45 to 3.40282346638528860e+38, positive or negative |
FLOAT |
- When a distributed query uses high-performance hardware instructions to perform single-precision or double-precision computing, the computing result may be slightly different because the execution sequence is different each time when an aggregate function, such as SUM() or AVG(), is invoked. Especially when the data volume is large (tens of millions or even billions of records), the computing result may be slightly different. In this case, you are advised to use the DECIMAL data type.
- An alias can be used to specify the data type.
--Create a table that contains float data. CREATE TABLE float_t1 (float_col1 FLOAT) ; --Insert data of the float type. insert into float_t1 values (float '3.50282346638528862e+38'); --View data. SELECT * FROM float_t1; float_col1 ------------ Infinity (1 row) --Drop the table. DROP TABLE float_t1;
- When the decimal part is 0, you can use cast() to convert the decimal part to an integer of the corresponding range. The decimal part is rounded off.
Example:
select CAST(1000.0001 as INT); _col0 ------- 1000 (1 row) select CAST(122.5001 as TINYINT); _col0 ------- 123 (1 row)
- When an exponential expression is used, the string can be converted to the corresponding type.
Character
Name |
Description |
---|---|
VARCHAR(n) |
Variable-length character string. n indicates the byte length. |
CHAR(n) |
Fixed-length character string. If the length is insufficient, spaces are added. n indicates the byte length. If the precision n is not specified, the default value 1 is used. |
VARBINARY |
Variable-length binary data. The value must be prefixed with X, for example, X'65683F'. Currently, a binary character string of a specified length is not supported. |
JSON |
The value can be a JSON object, a JSON array, a JSON number, a JSON string, true, false or null. |
STRING |
String compatible with impala. The bottom layer is varchar. |
BINARY |
Compatible with Hive BINARY. The underlying implementation is VARBINARY. |
- In SQL expressions, simple character expressions and unicodes are supported. A unicode character string uses U& as the fixed prefix. An escape character must be added before Unicode that is represented by a 4-digit value.
-- Character expression select 'hello,winter!'; _col0 ------------------ hello,winter! (1 row) -- Unicode expression select U&'Hello winter \2603 !'; _col0 ------------------ Hello winter! (1 row) -- User-defined escape character select U&'Hello winter #2603 !' UESCAPE '#'; _col0 ------------------ Hello winter! (1 row)
- VARBINARY and BINARY
-- Creating a VARBINARY or BINARY Table create table binary_tb(col1 BINARY); --Insert data. INSERT INTO binary_tb values (X'63683F'); --Query data. select * from binary_tb ; -- 63 68 3f
- When two CHARs with different numbers of spaces at the end are compared, the two CHARs are considered equal.
SELECT CAST('FO' AS CHAR(4)) = CAST('FO ' AS CHAR(5)); _col0 ------- true (1 row)
Time and Date Type
The time and date are accurate to milliseconds.
Name |
Description |
Storage Space |
---|---|---|
DATE |
Date and time. Only the ISO 8601 format is supported, for example 2020-01-01. |
32 bits |
TIME |
Time (hour, minute, second, millisecond) without a time zone Example: TIME '01:02:03.456' |
64 bits |
TIME WITH TIMEZONE |
Time with a time zone (hour, minute, second, millisecond). Time zones are expressed as the numeric UTC offset value. Example: TIME '01:02:03.456 -08:00' |
96 bits |
TIMESTAMP |
Timestamp |
64 bits |
TIMESTAMP WITH TIMEZONE |
Timestamp with time zone |
64 bits |
INTERVAL YEAR TO MONTH |
Time interval literal year and month, in the format of SY-M. S: optional symbols (+/-) Y: years M: months |
128 characters |
INTERVAL DAY TO SECOND |
The time interval literally indicates the day, hour, minute, and second, and is accurate to millisecond. The format is SD H:M:S.nnn. S: optional symbols (+/-) D: days M: minutes S: seconds nnn: milliseconds |
128 characters |
Example:
-- Query the date: SELECT DATE '2020-07-08'; _col0 ------------ 2020-07-08 (1 row) -- Query time: SELECT TIME '23:10:15'; _col0 -------------- 23:10:15 (1 row) SELECT TIME '01:02:03.456 -08:00'; _col0 -------------- 01:02:03.456-08:00 (1 row) -- Time interval usage SELECT TIMESTAMP '2015-10-18 23:00:15' + INTERVAL '3 12:15:4.111' DAY TO SECOND; _col0 ------------------------- 2015-10-22 11:15:19.111 (1 row) SELECT TIMESTAMP '2015-10-18 23:00:15' + INTERVAL '3-1' YEAR TO MONTH; _col0 ------------------------- 2018-11-18 23:00:15 (1 row) select INTERVAL '3' YEAR + INTERVAL '2' MONTH ; _col0 ------- 3-2 (1 row) select INTERVAL '1' DAY+INTERVAL '2' HOUR +INTERVAL '3' MINUTE +INTERVAL '4' SECOND ; _col0 ---------------- 1 02:03:04.000 (1 row)
ARRAY
Array
Example: ARRAY[1, 2, 3].
-- Create an ARRAY type table. create table array_tb(col1 ARRAY<STRING>); -- Insert a record of the ARRAY type. insert into array_tb values(ARRAY['HetuEngine','Hive','Mppdb']); -- Query data. select * from array_tb; -- [HetuEngine, Hive, Mppdb]
MAP
Data type of a key-value pair.
Example: MAP(ARRAY['foo', 'bar'], ARRAY[1, 2]).
-- Create a Map table. create table map_tb(col1 MAP<STRING,INT>); -- Insert a piece of data of the Map type. insert into map_tb values(MAP(ARRAY['foo','bar'],ARRAY[1,2])); -- Query data. select * from map_tb; -- {bar=2, foo=1}
ROW
ROW fields can be any supported data type or a combination of different field data types.
-- Create a ROW table. create table row_tb (id int,col1 row(a int,b varchar)); -- Insert data of the ROW type. insert into row_tb values (1,ROW(1,'HetuEngine')); -- Query data. select * from row_tb; id | col1 ----|-------------- 1 | {a=1, b=HetuEngine} --Fields can be named. By default, a Row field is not named. select row(1,2e0),CAST(ROW(1, 2e0) AS ROW(x BIGINT, y DOUBLE)); _col0 | _col1 ------------------------|-------------- {1, 2.0} | {x=1, y=2.0} (1 row) --Named fields can be accessed using the domain operator ".". select col1.b from row_tb; -- HetuEngine --Both named and unnamed fields can be accessed through location indexes, which start from 1 and must be a constant. select col1[1] from row_tb; -- 1
IPADDRESS
IP address, which can be an IPv4 or IPv6 address. In the system, however, this type of address is a unified IPv6 address.
IPv4 is supported by mapping IPv4 addresses to the IPv6 address range (RFC 4291#section-2.5.5.2). When an IPv4 address is created, it is mapped to an IPv6 address. During formatting, if the data is IPv4, the data will be mapped to IPv4 again. Other addresses are formatted according to the format defined in RFC 5952.
Example:
select IPADDRESS '10.0.0.1', IPADDRESS '2001:db8::1'; _col0 | _col1 ----------|------------- 10.0.0.1 | 2001:db8::1 (1 row)
UUID
A standard universally unique identifier (UUID) is also called a globally unique identifier (GUID).
It complies with the format defined in RFC 4122.
Example:
select UUID '12151fd2-7586-11e9-8f9e-2a86e4085a59'; _col0 -------------------------------------- 12151fd2-7586-11e9-8f9e-2a86e4085a59 (1 row)
HYPERLOGLOG
Base statistics.
The cost of using HyperLogLog to approximate the count value of a unique number is far less than that of using count.
For details, see HyperLogLog Functions.
QDIGEST
A quantile, also referred to as a quantile point, refers to that a probability distribution range of a random variable is divided into several equal numerical points. Commonly used quantile points include a median (that is, a 2-quantile), a 4-quartile, and a 100-percentile. Quantile digest is a set of quantiles. When the data to be queried is close to a quantile, the quantile can be used as the approximate value of the data to be queried. Its precision can be adjusted, but higher precision results in expensive space overhead.
STRUCT
The bottom layer is implemented using ROW. For details, see ROW.
Example:
-- Create a STRUCT table. create table struct_tab (id int,col1 struct<col2: integer, col3: string>); -- Insert data of the STRUCT type. insert into struct_tab VALUES(1, struct<2, 'HetuEngine'>); -- Query data. select * from struct_tab; id | col1 ----|--------------------- 1 | {col2=2, col3=HetuEngine}
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot