Help Center/ MapReduce Service/ Component Operation Guide (LTS)/ Using HetuEngine/ Common HetuEngine SQL Syntax/ HetuEngine Data Type

Updated on 2024-12-13 GMT+08:00

View PDF

HetuEngine Data Type

Currently, the following data types are supported during table creation: TINYINT, SMALLINT, BIGINT, INT, BOOLEAN, REAL, DECIMAL, DOUBLE, VARCHAR, STRING, BINARY, VARBINARY, TIMESTAMP, DATE, CHAR, ARRAY, ROW, MAP, and STRUCT. Other types are supported during data query and calculation.

Generally, most non-composite data types can be entered by literals and character strings. In the example, a string in the JSON format is added.

select json '{"name": "aa", "gender": "man"}';
              _col0              
---------------------------------
 {"name":"aa","gender":"man"} 
(1 row)

Boolean

The valid text values for "true" are TRUE, t, true, and 1.

The valid text values for the "false" value are FALSE, f, false, and 0.

TRUE and FALSE are standard usages (SQL-compatible).

The following is an example:

select BOOLEAN '0';
 _col0 
-------
 false 
(1 row)
 
select BOOLEAN 'TRUE';
 _col0 
-------
 true  
(1 row)
 
select BOOLEAN 't';
 _col0 
-------
 true  
(1 row)

Integer

**Table 1** Integer
Parameter	Description	Storage Space	Value Range	Literal
TINYINT	Tiny integer	8 bits	-128~127	TINYINT
SMALLINT	Small integer	16 bits	-32,768 ~ +32,767	SMALLINT
INTEGER	Integer	32 bits	-2,147,483,648 ~ +2,147,483,647	INT
BIGINT	Big integer	64 bits	-9,223,372,036,854,775,808 ~ 9,223,372,036,854,775,807	BIGINT

The following is an example:

--Create a table containing TINYINT data:
CREATE TABLE int_type_t1  (IT_COL1 TINYINT) ;    
--Insert data of the TINYINT type:
insert into int_type_t1  values (TINYINT'10');    
--View data:
SELECT * FROM int_type_t1;  
 it_col1    
---------   
 10  
(1 row)    
--Drop a table:
DROP TABLE int_type_t1;

Fixed Precision

Parameter	Description	Storage Space	Value Range	Literal
DECIMAL	Decimal number with fixed precision. The maximum precision is 38 bits, but the precision with less than 18 bits can ensure the best performance. Decimal has two input parameters: precision: total number of digits. The default value is 38. scale: number of decimal places. The default value is 0. NOTE: If scale is 0 and precision is 38, the maximum precision is 19 bits.	64 bits		DECIMAL
NUMERIC	Same as DECIMAL	128 characters		NUMERIC

Parameter

Description

Storage Space

Value Range

Literal

DECIMAL

Decimal number with fixed precision. The maximum precision is 38 bits, but the precision with less than 18 bits can ensure the best performance.

Decimal has two input parameters:

precision: total number of digits. The default value is 38.
scale: number of decimal places. The default value is 0.
NOTE:

If scale is 0 and precision is 38, the maximum precision is 19 bits.

64 bits

Click to enlarge

DECIMAL

NUMERIC

Same as DECIMAL

128 characters

Click to enlarge

NUMERIC

**Table 2** Examples of literals
Example	Data Type
DECIMAL '0'	DECIMAL(1)
DECIMAL '12345'	DECIMAL(5)
DECIMAL '0000012345.1234500000'	DECIMAL(20, 10)

--Create a table containing DECIMAL data.
CREATE TABLE decimal_t1 (dec_col1 DECIMAL(10,3)) ;    

– Insert data of the DECIMAL type.
insert into decimal_t1 values (DECIMAL '5.325');     

--View data.
SELECT * FROM decimal_t1;  
 dec_col1  
---------   
 5.325 
(1 row)    

-- Negative example: The number of decimal places exceeds the defined length. As a result, the SQL statement fails to be executed.
insert into decimal_t1 values (DECIMAL '5.3253');
Query 20201126_034601_00053_tq98i@default@HetuEngine failed: Insert query has mismatched column types: Table: [decimal(10,3)], Query: [decimal(5,4)]

--Drop the table.
DROP TABLE decimal_t1;  

--Create a NUMERIC type table.
CREATE TABLE tb_numberic_hetu(col1 NUMERIC(9,7));
CREATE TABLE

--Insert data.
INSERT INTO tb_numberic_hetu values(9.12);
INSERT: 1 row

--View data.
SELECT * FROM tb_numberic_hetu;
col1    
------------
  9.1200000 
(1 row)

Float

Parameter	Description	Storage Space	Value Range	Literal
REAL	Real number	32 bits	1.40129846432481707e-45 to 3.40282346638528860e+38, positive or negative	REAL
DOUBLE	Double-precision floating point number with 15 to 17 valid digits, depending on the application scenario. The number of valid digits does not depend on the decimal point.	64 bits	4.94065645841246544e-324 to 1.79769313486231570e+308, positive or negative	DOUBLE
FLOAT	Single-precision floating point number with 6 to 9 valid digits, depending on the application scenario. The number of valid digits does not depend on the decimal point.	32 bits	1.40129846432481707e-45 to 3.40282346638528860e+38, positive or negative	FLOAT

Usage:

When a distributed query uses high-performance hardware instructions to perform single-precision or double-precision computing, the computing result may be slightly different because the execution sequence is different each time when an aggregate function, such as SUM() or AVG(), is invoked. Especially when the data volume is large (tens of millions or even billions of records), the computing result may be slightly different. In this case, you are advised to use the DECIMAL data type.

An alias can be used to specify the data type.

The following is an example:

--Create a table that contains float data.
CREATE TABLE float_t1 (float_col1 FLOAT) ;
--Insert data of the float type.
insert into float_t1 values (float '3.50282346638528862e+38');
--View data.
SELECT * FROM float_t1;
float_col1
------------
Infinity
(1 row)
--Drop the table.
DROP TABLE float_t1;

When the decimal part is 0, you can use cast() to convert the decimal part to an integer of the corresponding range. The decimal part is rounded off.
The following is an example:
```
select CAST(1000.0001 as INT);
_col0
-------
1000
(1 row)
select CAST(122.5001 as TINYINT);
_col0
-------
123
(1 row)
```

When an exponential expression is used, the string can be converted to the corresponding type.
The following is an example:
```
select CAST(152e-3 as double);
_col0
-------
0.152
(1 row)
```

Character

Parameter	Description
VARCHAR(n)	Variable-length character string. n indicates the byte length.
CHAR(n)	Fixed-length character string. If the length is insufficient, spaces are added. n indicates the byte length. If the precision n is not specified, the default value 1 is used.
VARBINARY	Variable-length binary data. The value must be prefixed with X, for example, X'65683F'. Currently, a binary character string of a specified length is not supported.
JSON	The value can be a JSON object, a JSON array, a JSON number, a JSON string, true, false or null.
STRING	String compatible with impala. The bottom layer is varchar.
BINARY	Compatible with Hive BINARY. The underlying implementation is VARBINARY.

In SQL expressions, simple character expressions and unicodes are supported. A unicode character string uses U& as the fixed prefix. An escape character must be added before Unicode that is represented by a 4-digit value.

-- Character expression
select 'hello,winter!';  
      _col0         
------------------  
 hello,winter!   
(1 row)  
-- Unicode expression
select U&'Hello winter \2603 !';  
      _col0         
------------------  
 Hello winter ☃ !   
(1 row)    
-- User-defined escape character
select U&'Hello winter #2603 !' UESCAPE '#';  
      _col0         
------------------  
 Hello winter ☃ !   
(1 row)

VARBINARY and BINARY

-- Create a VARBINARY or BINARY table.
 create table binary_tb(col1 BINARY);

--Insert data.
 INSERT INTO binary_tb values (X'63683F');

--Query data.
 select * from binary_tb ;  -- 63 68 3f

When two CHARs with different numbers of spaces at the end are compared, the two CHARs are considered equal.
```
SELECT CAST('FO' AS CHAR(4)) = CAST('FO     ' AS CHAR(5));
 _col0 
-------
 true  
(1 row)
```

Time and Date Type

The time and date are accurate to milliseconds.

**Table 3** Time and date type
Parameter	Description	Storage Space
DATE	Date and time. Only the ISO 8601 format is supported, for example 2020-01-01.	32 bits
TIME	Time (hour, minute, second, millisecond) without a time zone Example: TIME '01:02:03.456'	64 bits
TIME WITH TIMEZONE	Time with a time zone (hour, minute, second, millisecond). Time zones are expressed as the numeric UTC offset value. Example: TIME '01:02:03.456 -08:00'	96 bits
TIMESTAMP	Timestamp	64 bits
TIMESTAMP WITH TIMEZONE	Timestamp with time zone	64 bits
INTERVAL YEAR TO MONTH	Time interval literal year and month, in the format of SY-M. S: optional symbols (+/-) Y: years M: months	128 bits
INTERVAL DAY TO SECOND	The time interval literally indicates the day, hour, minute, and second, and is accurate to millisecond. The format is SD H:M:S.nnn. S: optional symbols (+/-) D: days M: minutes S: seconds nnn: milliseconds	128 bits

The following is an example:

-- Query the date:
SELECT DATE '2020-07-08';
   _col0    
------------
 2020-07-08 
(1 row)

-- Query time:
SELECT TIME '23:10:15';
 _col0     
--------------
 23:10:15 
(1 row)

SELECT TIME '01:02:03.456 -08:00';
 _col0     
--------------
 01:02:03.456-08:00 
(1 row)

-- Time interval usage
SELECT TIMESTAMP '2015-10-18 23:00:15' + INTERVAL '3 12:15:4.111' DAY TO SECOND;      
 _col0          
------------------------- 
 2015-10-22 11:15:19.111 
(1 row)

SELECT TIMESTAMP '2015-10-18 23:00:15' + INTERVAL '3-1' YEAR TO MONTH;
          _col0          
-------------------------
 2018-11-18 23:00:15 
(1 row)

select INTERVAL '3' YEAR + INTERVAL '2' MONTH ;
 _col0 
-------
 3-2   
(1 row)

select INTERVAL '1' DAY+INTERVAL '2' HOUR +INTERVAL '3' MINUTE +INTERVAL '4' SECOND ;
     _col0      
----------------
 1 02:03:04.000 
(1 row)

ARRAY

Array

Example: ARRAY[1, 2, 3].

-- Create an ARRAY type table.
create table array_tb(col1 ARRAY<STRING>);

-- Insert a record of the ARRAY type.
insert into array_tb values(ARRAY['HetuEngine','Hive','Mppdb']);

--Query data.
select * from array_tb; --    [HetuEngine, Hive, Mppdb]

MAP

Data type of a key-value pair.

Example: MAP(ARRAY['foo', 'bar'], ARRAY[1, 2]).

-- Create a Map table.
create table map_tb(col1 MAP<STRING,INT>);

-- Insert a piece of data of the Map type.
insert into map_tb values(MAP(ARRAY['foo','bar'],ARRAY[1,2]));

--Query data.
select * from map_tb; --   {bar=2, foo=1}

ROW

ROW fields can be any supported data type or a combination of different field data types.

-- Create a ROW table.
create table row_tb (id int,col1 row(a int,b varchar));

-- Insert data of the ROW type.
insert into row_tb values (1,ROW(1,'HetuEngine'));

--Query data.
select * from row_tb; 
 id |     col1     
----|--------------
  1 | {a=1, b=HetuEngine} 

--Fields can be named. By default, a Row field is not named.
select  row(1,2e0),CAST(ROW(1, 2e0) AS ROW(x BIGINT, y DOUBLE));
           _col0          |    _col1     
------------------------|--------------
 {1, 2.0} | {x=1, y=2.0} 
(1 row) 

--Named fields can be accessed using the domain operator ".".
select col1.b from row_tb; -- HetuEngine

--Both named and unnamed fields can be accessed through location indexes, which start from 1 and must be a constant.
select col1[1] from row_tb;   -- 1

IPADDRESS

IP address, which can be an IPv4 or IPv6 address. In the system, however, this type of address is a unified IPv6 address.

IPv4 is supported by mapping IPv4 addresses to the IPv6 address range (RFC 4291#section-2.5.5.2). When an IPv4 address is created, it is mapped to an IPv6 address. During formatting, if the data is IPv4, the data will be mapped to IPv4 again. Other addresses are formatted according to the format defined in RFC 5952.

The following is an example:

select IPADDRESS '10.0.0.1', IPADDRESS '2001:db8::1';
  _col0   |    _col1
----------|-------------
 10.0.0.1 | 2001:db8::1
(1 row)

UUID

A standard universally unique identifier (UUID) is also called a globally unique identifier (GUID).

It complies with the format defined in RFC 4122.

The following is an example:

select UUID '12151fd2-7586-11e9-8f9e-2a86e4085a59';
                _col0
--------------------------------------
 12151fd2-7586-11e9-8f9e-2a86e4085a59
(1 row)

HYPERLOGLOG

Base statistics.

The cost of using HyperLogLog to approximate the count value of a unique number is far less than that of using count.

For details, see HyperLogLog Functions.

HyperLogLog
A HyperlogLog Sketch can be used to efficiently calculate the approximate value of distinct().

It begins with a sparse representation, then becomes a dense representation. And at this time efficiency becomes higher.

P4HyperLogLog
Similar to a HyperlogLog Sketch, but it starts with a dense representation.

QDIGEST

A quantile, also referred to as a quantile point, refers to that a probability distribution range of a random variable is divided into several equal numerical points. Commonly used quantile points include a median (that is, a 2-quantile), a 4-quartile, and a 100-percentile. Quantile digest is a set of quantiles. When the data to be queried is close to a quantile, the quantile can be used as the approximate value of the data to be queried. Its precision can be adjusted, but higher precision results in expensive space overhead.

STRUCT

The bottom layer is implemented using ROW. For details, see ROW.

The following is an example:

-- Creating a STRUCT table.
create table struct_tab (id int,col1 struct<col2: integer, col3: string>);

-- Insert data of the STRUCT type.
 insert into struct_tab VALUES(1, struct<2, 'HetuEngine'>);

--Query data.
select * from struct_tab; 
 id |        col1         
----|---------------------
  1 | {col2=2, col3=HetuEngine}

Parent topic: Common HetuEngine SQL Syntax

Previous topic: Common HetuEngine SQL Syntax

Next topic: HetuEngine DDL SQL Syntax