CarbonData Data Types
Description
In CarbonData, data is stored in entities called tables. CarbonData tables are similar to RDBMS tables which organize data in rows and columns. CarbonData tables store structured data, and have fixed columns and data types.
Supported Data Types
CarbonData tables support the following data types:
- Int
- String
- BigInt
- Smallint
- Char
- Varchar
- Boolean
- Decimal
- Double
- TimeStamp
- Date
- Array
- Struct
- Map
The following table describes supported data types and their respective values ranges.
Data Type |
Value Range |
---|---|
Int |
4-byte signed integer ranging from -2,147,483,648 to 2,147,483,647.
NOTE:
int data in a non-dictionary column is internally stored as the BigInt type. |
String |
100,000 characters
NOTE:
If the CHAR or VARCHAR data type is used in CREATE TABLE, the two data types are automatically converted to the String data type. If a column contains more than 32,000 characters, add the column to the LONG_STRING_COLUMNS attribute of the tblproperties table during table creation. |
BigInt |
64-bit value ranging from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 |
SmallInt |
–32,768 to 32,767 |
Char |
A to Z and a to z |
Varchar |
A to Z, a to z, and 0 to 9 |
Boolean |
true or false |
Decimal |
The default value is (10,0) and maximum value is (38,38).
NOTE:
When query with filters, append BD to the number to achieve accurate results. For example, select * from carbon_table where num = 1234567890123456.22BD. |
Double |
64-bit value ranging from 4.9E-324 to 1.7976931348623157E308 |
TimeStamp |
The default format is yyyy-MM-dd HH:mm:ss. |
Date |
The DATE data type is used to store calendar dates. The default format is yyyy-MM-DD. |
Array<data_type> |
N/A
NOTE:
Currently, only two layers of complex types can be nested. |
Struct<col_name: data_type COMMENT col_comment, ...> |
|
Map<primitive_type, data_type> |
Main Specifications of CarbonData
Entity |
Tested Value |
Test Environment |
---|---|---|
Number of tables |
10000 |
3 nodes. 4 vCPUs and 20 GB memory for each executor. Driver memory: 5 GB, 3 executors. Total columns: 107 String: 75 Int: 13 BigInt: 7 Timestamp: 6 Double: 6 |
Number of table columns |
2000 |
3 nodes. 4 vCPUs and 20 GB memory for each executor. Driver memory: 5 GB, 3 executors. |
Maximum size of a raw CSV file |
200GB |
17 cluster nodes. 150 GB memory and 25 vCPUs for each executor. Driver memory: 10 GB, 17 executors. |
Number of CSV files in each folder |
100 folders. Each folder has 10 files. The size of each file is 50 MB. |
3 nodes. 4 vCPUs and 20 GB memory for each executor. Driver memory: 5 GB, 3 executors. |
Number of load folders |
10000 |
3 nodes. 4 vCPUs and 20 GB memory for each executor. Driver memory: 5 GB, 3 executors. |
Table Specifications
Entity |
Tested Value |
---|---|
Number of secondary index tables |
10 |
Number of composite columns in a secondary index table |
5 |
Length of column name in a secondary index table (unit: character) |
120 |
Length of a secondary index table name (unit: character) |
120 |
Cumulative length of all secondary index table names + column names in an index table* (unit: character) |
3800** |
- * Characters of column names in an index table refer to the upper limit allowed by Hive or the upper limit of available resources.
- ** Secondary index tables are registered using Hive and stored in HiveSERDEPROPERTIES in JSON format. The value of SERDEPROPERTIES supported by Hive can contain a maximum of 4,000 characters and cannot be changed.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot