Vectorized Executor and Hybrid Row-Column Storage Engine

Background

In a wide table containing a huge amount of data, a query usually only involves certain columns. In this case, the query performance of the row-store engine is poor. For example, a single table containing the data of a meteorological agency has 200 to 800 columns. Among these columns, only 10 are frequently accessed. This is where vectorized executor technology steps in, by improving performance and saving storage space.

Vectorized Execution

Figure 1 shows a standard vectorized executor. Control flow travels in the downlink direction (shown as solid lines in the following figure) and data flow in the uplink direction (shown as dotted lines in the following figure). The upper-layer node invokes the lower-layer node to ask for data and the lower-layer node only returns one tuple to the upper-layer node at a time.

Instead of returning only one tuple at a time in the traditional executor, the vectorized executor returns a batch of tuples at a time, which significantly improves executor performance with the aid of the column storage feature.

Figure 1 Vectorized executor
Click to enlarge

Hybrid Row-Column Storage Executor

GaussDB(DWS) supports both the row and column storage models. You can choose a row- or column-store table as needed.

Generally, if a table contains many columns (called a wide table) and its query involves only a few columns, column storage is recommended. If a table contains only a few columns and a query includes most of the fields, row storage is recommended.

The hybrid row-column storage engine achieves higher data compression ratio (column storage), index performance (column storage), and point update and point query (row storage) performance, as shown in Figure 2.

Figure 2 Hybrid row-column storage engine
Click to enlarge

Data compression is supported for column storage. You can compress old, inactive data to free up space, reducing procurement and O&M costs.

In GaussDB(DWS), data can be compressed using the Delta Value Encoding, Dictionary, RLE, LZ4, and ZLIB algorithms. The system automatically selects a compression algorithm based on data characteristics. The average compression ratio is 7:1. Compressed data can be directly accessed and is transparent to services, greatly reducing the preparation time before accessing historical data.

The restrictions of the column storage engine are as follows:

Only the CREATE TABLE, DROP TABLE, and TRUNCATE TABLE operations can be performed for DDL.
Partition management using DDL statements (such as ADD PARTITION, DROP PARTITION, MERGE PARTITION, and EXCHANGE) can be used.

The CREATE TABLE LIKE statement is supported.

The ALTER TABLE statement is partially supported.

Other DDL statements are not supported.
Among DML syntax, UPDATE, COPY, BULKLOAD, and DELETE are supported.
A trigger and primary foreign key are not supported.
Psort indexes, B-tree indexes, and GIN indexes are supported. For details about the constraints, see CREATE INDEX .