Using Hybrid Row-Column Storage
Scenarios
In real-world service scenarios, a table must efficiently handle both high-concurrency point queries and updates (for example, for orders, bills, and logs) and support fast batch queries and aggregation for analytical workloads (for example, statistical reports and trend analysis). Traditional row storage and column storage have their own advantages:
- Row storage is suitable for real-time scenarios with frequent point queries and writes, for example, querying order details by order number.
- Column storage is suitable for batch analysis queries, such as calculating the total transaction amount within a period of time.
Hybrid row-column storage provided by DWS stores both row- and column-format data in one table. The data of the two formats is independently maintained and synchronously updated. The query optimizer selects the optimal access mode based on the actual query path.
Its core advantages are as follows:
- No need to split tables or copy data: One table supports both detailed query and batch analysis.
- Automatic selection of the optimal path: Row storage is used for point queries, and column storage is used for querying aggregation fields No additional development is required.
- Optimal performance: OLAP query efficiency is ensured without sacrificing OLTP performance.
- Compatible with original row-store table design: You can quickly enjoy the performance benefits of column-store and row-store optimizations without adjusting the application structure.
- Unified management and reduced O&M costs: One physical table covers two scenarios, simplifying data synchronization, backup, and permission control.
This section describes how to design and use hybrid row-column storage structures based on typical scenarios and performance comparison, helping you process data efficiently.
Comparison of Row Storage, Column Storage, Hybrid Row-Column Storage
Dimension |
Row Storage (orientation='row') |
Column Storage (HStore Opt tables) |
Hybrid Row-Column Storage (storage_mode='mix') |
---|---|---|---|
Storage architecture |
Native row storage (not based on column storage) |
HStore Opt column storage |
Coexistence of the row storage and HStore Opt column storage |
Point query (primary key) |
|
|
|
Batch import performance |
|
|
|
Real-time import performance |
|
|
|
Aggregation/Analysis performance |
|
|
|
Occupied Space |
|
|
|
Space bloat |
|
|
|
DWS column-store special optimization |
|
|
|
Constraints
- Only clusters of version 9.1.1.100 or later support this function.
- HStore tables are used, that is, enable_hstore_opt is set to on.
- This function is unavailable for V3 tables with decoupled storage and compute.
- The enable_light_update option of HStore tables cannot be enabled.
- Binlog tables and materialized views are not supported.
- When a small number of data is copied to a database in real time, the performance of hybrid row-column storage deteriorates by 10% compared with that of the HStore table.
- A hybrid row-column store table can be exchanged only with another hybrid row-column store table. After capacity expansion and redistribution, the exchange is not allowed.
Usage Suggestions
- In only OLTP scenarios (such as high-frequency point queries and writes), if the space usage is sensitive, hybrid row-column store tables are recommended to balance space usage and point query performance. If the space usage is not sensitive and robust performance is required, row-store tables are recommended.
- In only OLAP scenarios (such as statistics analysis and report), HStore tables are recommended.
- In both detailed and statistical queries where hot and cold columns are difficult to distinguish, hybrid row-column store tables are recommended to achieve excellent performance in each scenario.
- For real-time data import, PBE addBatch is recommended.
- You are advised not to modify column definitions unless necessary. If data rewriting is triggered, the row-store part of a hybrid row-column store table will be rewritten, which is time-consuming.
Syntax Reference
Create a hybrid row-column store table. For more information, see CREATE TABLE. (Only 9.1.1.100 and later versions support this function.)
1 2 3 4 5 6 7 8 |
CREATE TABLE < table name> ( <Column definition> ) WITH ( orientation = column, -- Use the column storage architecture. enable_hstore_opt = on, -- Enable hstore_opt. storage_mode = 'mix' --Create a hybrid row-column store table. ); |
Example:
-- Hybrid row-column storage stores data in both row and column formats. It is suitable for mixed services of detailed analysis and analysis. CREATE TABLE tbl_mix ( a INT, b TEXT ) WITH ( orientation = column, enable_hstore_opt = on, storage_mode = 'mix' );
Example of Using Hybrid Row-Column Storage
- Create a simple data table.
1 2 3
DROP TABLE IF EXISTS data; CREATE TABLE data(a INT, b BIGINT, c VARCHAR(10), d VARCHAR(10)); INSERT INTO data values(generate_series(1,100),1,'asdfasdf','gergqer');
- Expand the table data to 200,000 records.
1 2 3 4 5 6 7 8 9 10 11 12
INSERT INTO data SELECT * FROM data; INSERT INTO data SELECT * FROM data; INSERT INTO data SELECT * FROM data; INSERT INTO data SELECT * FROM data; INSERT INTO data SELECT * FROM data; INSERT INTO data SELECT * FROM data; INSERT INTO data SELECT * FROM data; INSERT INTO data SELECT * FROM data; INSERT INTO data SELECT * FROM data; INSERT INTO data SELECT * FROM data; INSERT INTO data SELECT * FROM data; SELECT COUNT(*) FROM data;
- Create a simple hybrid row-column store table.
1 2 3 4 5 6 7 8 9 10 11 12
DROP TABLE IF EXISTS rowmode_test1; CREATE TABLE rowmode_test1 ( a INT, b BIGINT, c VARCHAR(10), d VARCHAR(10) ) WITH ( orientation = column, enable_hstore_opt = on, storage_mode = 'mix' );
- Import data to the hybrid row-column store table.
1
INSERT INTO rowmode_test1 SELECT * FROM data;
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot