Creating and Managing Tables
Creating a Table
The CREATE TABLE command is used to create a table. When creating a table, you can define the following content:
- Columns and their data types.
- Table distribution definition, that is, the distribution strategy of the table, which determines how the DataArts Fabric SQL database divides data among segments. Table storage format.
- Partitioned table definition.
Example: CREATE TABLE creates a table named web_returns_p1, storing data in the ORC file format.
1 2 3 4 5 6 7 |
CREATE TABLE web_returns_p1 ( wr_returned_date_sk integer, wr_returned_time_sk integer, wr_item_sk integer, wr_refunded_customer_sk integer ) store as ORC; |
Table Types
DataArts Fabric SQL supports two table types: managed table and external table.
Type |
Description |
Use Case |
---|---|---|
Managed table |
Both data and metadata are managed by DataArts Fabric SQL. Users can manage metadata and data files through DataArts Fabric SQL. Deleting a table removes both its data and metadata. The data path must be a parallel file system. |
|
External table |
DataArts Fabric SQL reads data files from a specified location through metadata or table schemas. Users cannot modify the actual data files through DataArts Fabric SQL. Deleting a table only removes its metadata, leaving the data files unaffected. |
|

For managed tables, data files are still stored in a conventional file system (for example, OBS parallel file system). Users can alter these files without informing DataArts Fabric SQL. However, doing so violates the expectations and agreements of DataArts Fabric SQL regarding managed tables, potentially leading to undefined errors.
-- Create a schema. CREATE SCHEMA sales_schema WITH LOCATION 'obs://bucket/catalog1/sales_schema/'; -- Create a managed table. CREATE TABLE region_sales_info(id int, item varchar(128), sale_date date) partition by (region varchar(128)) store as parquet; -- Create an external table to point to the managed table. CREATE EXTERNAL TABLE readonly_region_sales_info(id int, item varchar(128), sale_date date) partition by (region varchar(128)) store as parquet location 'obs://bucket/catalog1/sales_schema/region_sales_info/'; -- Import service data from the managed table. INSERT INTO region_sales_info VALUES (0, 'apple', '2001-01-01', 'Region A'), (1, 'banana', '2001-01-02', 'Region B'), (2, 'carrot', '2001-01-03', 'Region C'), (3, 'desk', '2001-01-04', 'Region D'); SELECT * FROM region_sales_info order by 1; id | item | sale_date | region ----+--------+------------+---------- 0 | apple | 2001-01-01 | Region A 1 | banana | 2001-01-02 | Region B 2 | carrot | 2001-01-03 | Region C 3 | desk | 2001-01-04 | Region D (4 rows) -- No data can be queried as the external table initially does not contain partition information. SELECT * FROM readonly_region_sales_info; id | item | sale_date | region ----+------+-----------+-------- (0 rows) -- Restore the partition information of the external table. MSCK REPAIR TABLE readonly_region_sales_info; SELECT * FROM readonly_region_sales_info order by 1; id | item | sale_date | region ----+--------+------------+---------- 0 | apple | 2001-01-01 | Region A 1 | banana | 2001-01-02 | Region B 2 | carrot | 2001-01-03 | Region C 3 | desk | 2001-01-04 | Region D (4 rows) -- Drop the external table does not affect data files. DROP TABLE readonly_region_sales_info; SELECT COUNT(1) from region_sales_info; count ------- 4 (1 row) -- Dropping the managed table also deletes the data. DROP TABLE region_sales_info;
Table Distribution Definition
Strategy |
Description |
Use Case |
Advantage and Disadvantage |
---|---|---|---|
Hash |
Data is organized into N buckets (files) using a hash algorithm. |
Fact tables with large amounts of data. |
|
Round-robin |
Data is randomly organized across multiple files. |
Large fact tables where no suitable distribution column exists for hash distribution. |
|
Selecting a distribution key
When using the hash distribution method, users must specify a distribution key for the table. Upon inserting a record, the system calculates a hash based on the distribution column's value and stores the data on the corresponding DataNode (DN).
Thus, selecting a hash distribution key is critical and must adhere to the following principles:
- The key values should be highly distinct. For example, consider choosing the table's primary key as the distribution key, such as selecting the ID number in a personnel information table.
- Do not select columns with constant filters whenever possible. For example, if certain queries involving the table dwcjk have a constant constraint on the key zqdh (for example, zqdh='000001'`), do not use zqdh as the distribution key.
- After satisfying the first two principles, prioritize selecting join conditions as the distribution key to push join tasks down to DNs for execution, minimizing inter-DN communication.
Improper selection of a distribution key in a hash table may lead to data skew, causing I/O bottlenecks in some DNs and degrading overall query performance. Hence, after applying the hash distribution strategy, perform a data skew check to ensure even data distribution across all DNs.
Viewing Table Data
- Query information about all tables in the current schema.
1
SHOW TABLES;
- Query the attributes of a table.
1
DESCRIBE web_returns_p1;
- Query the data volume of the web_returns_p1 table.
1
SELECT count(*) FROM web_returns_p1;
- Query all data from the web_returns_p1 table.
1
SELECT * FROM web_returns_p1;
- Query only the data of the c_customer_sk field.
1
SELECT c_customer_sk FROM web_returns_p1;
- Filter out duplicate data of the c_customer_sk field.
1
SELECT DISTINCT( c_customer_sk ) FROM web_returns_p1;
- Query all data whose c_customer_sk is 3869.
1
SELECT * FROM web_returns_p1 WHERE c_customer_sk = 3869;
- Sort data by the c_customer_sk field.
1
SELECT * FROM web_returns_p1 ORDER BY c_customer_sk;
Deleting Table Data
When using a table, you may need to delete the table data.
- Currently, data cannot be deleted using the DELETE command.
- Executing the following command will delete all rows in the table.
1
TRUNCATE TABLE customer_t1;
- Delete the created table.
1
DROP TABLE customer_t1;
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot