Updated on 2023-05-23 GMT+08:00

Planning a Storage Model

GaussDB(DWS) supports hybrid row and column storage. Each storage mode applies to specific scenarios. Select an appropriate mode when creating a table.

Row storage stores tables to disk partitions by row, and column storage stores tables to disk partitions by column. By default, a table is created in row storage mode. For details about differences between row storage and column storage, see Figure 1.

Figure 1 Differences between row storage and column storage

In the preceding figure, the upper left part is a row-store table, and the upper right part shows how the row-store table is stored on a disk; the lower left part is a column-store table, and the lower right part shows how the column-store table is stored on a disk.

Both storage modes have benefits and drawbacks.

Storage Mode

Benefit

Drawback

Row storage

All the columns of a record are stored in the same partition. Data can be easily inserted and updated.

All the columns of a record are read after the SELECT statement is executed even if only certain columns are required.

Column storage

  • Only necessary columns in a query are read.
  • Projections are efficient.
  • Any column can serve as an index.
  • The selected columns need to be reconstructed after the SELECT statement is executed.
  • Data cannot be easily inserted or updated.

Generally, if a table contains many columns (called a wide table) and its query involves only a few columns, column storage is recommended. If a table contains only a few columns and a query includes most of the fields, row storage is recommended.

Storage Mode

Application Scenario

Row storage

  • Point queries (simple index-based queries that only return a few records).
  • Scenarios requiring frequent addition, deletion, and modification.

Column storage

  • Statistical analysis queries (requiring a large number of association and grouping operations)
  • Ad hoc queries (using uncertain query conditions and unable to utilize indexes to scan row-store tables)

Row-Store Table

Row-store tables are created by default. In a row-store table, data is stored by row, that is, data in each row is stored continuously. Therefore, this storage model applies to scenarios where data needs to be updated frequently.

1
2
3
4
5
6
7
8
9
CREATE TABLE customer_t1
(
  state_ID   CHAR(2),
  state_NAME VARCHAR2(40),
  area_ID    NUMBER
);

--Delete the table.
DROP TABLE customer_t1;

Column-Store Table

In a column-store table, data is stored by column, that is, data in each column is stored continuously. The I/O of data query in a single column is small, and column-store tables occupy less storage space than row-store tables. This storage model applies to scenarios where data is inserted in batches, less updated, and queried for analysis. A column-store table cannot be used for point queries.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
CREATE TABLE customer_t2
(
  state_ID   CHAR(2),
  state_NAME VARCHAR2(40),
  area_ID    NUMBER
)
WITH (ORIENTATION = COLUMN);

--Delete the table.
DROP TABLE customer_t2;