Updated on 2023-10-23 GMT+08:00

Planning a Storage Model

GaussDB supports hybrid row-column storage. Each storage model applies to specific scenarios. Select an appropriate model when creating a table. Generally, GaussDB is used for transactional processing databases. By default, row storage is used. Column storage is used only when complex queries in large data volume are performed.

Row-store stores tables to disk partitions by row, and column-store stores tables to disk partitions by column. By default, a row-store table is created. For details about differences between row storage and column storage, see Figure 1.

Figure 1 Differences between row storage and column storage

In the preceding figure, the upper left part is a row-store table, and the upper right part shows how the row-store table is stored on a disk; the lower left part is a column-store table, and the lower right part shows how the column-store table is stored on a disk.

Both storage models have benefits and drawbacks.

Storage Model

Benefit

Drawback

Row storage

Record data is stored together. Data can be easily inserted and updated.

All the columns of a record are read after the SELECT statement is executed even if only certain columns are required.

Column storage

  • Only the columns involved in a query are read.
  • Projections are efficient.
  • Any column can serve as an index.
  • The selected columns need to be reconstructed after the SELECT statement is executed.
  • Data cannot be easily inserted or updated.

Generally, if a table contains many columns (called a wide table) and its query involves only a few columns, column storage is recommended. Row storage is recommended if a table contains only a few columns and a query involves most of the fields.

Storage Model

Application Scenarios

Row storage

  • Point queries (simple index-based queries that only return a few records)
  • Scenarios requiring frequent addition, deletion, and modification

Column storage

  • Statistical analysis queries (requiring a large number of association and grouping operations)
  • Ad hoc queries (using uncertain query conditions and unable to utilize indexes to scan row-store tables)

Row-Store Tables

Row-store tables are created by default. In a row-store table, data is stored by row, that is, data in each row is stored continuously. Therefore, this storage model applies to scenarios where data needs to be updated frequently.

1
2
3
4
5
6
7
8
9
openGauss=# CREATE TABLE customer_t1
(
  state_ID   CHAR(2),
  state_NAME VARCHAR2(40),
  area_ID    NUMBER
);

--Delete the table.
openGauss=# DROP TABLE customer_t1;

Column-Store Tables

In a column-store table, data is stored by column, that is, data in each column is stored continuously. The I/O of data query in a single column is small, and column-store tables occupy less storage space than row-store tables. This storage model applies to scenarios where data is inserted in batches, less updated, and queried for statistical analysis. A column-store table cannot be used for point queries.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
openGauss=# CREATE TABLE customer_t2
(
  state_ID   CHAR(2),
  state_NAME VARCHAR2(40),
  area_ID    NUMBER
)
WITH (ORIENTATION = COLUMN);

--Delete the table.
openGauss=# DROP TABLE customer_t2;

Selecting a Storage Model

  • Update frequency

    If data is frequently updated, use a row-store table.

  • Data insertion frequency

    If a small amount of data is frequently inserted each time, use a row-store table. If a large amount of data is inserted at a time, use a column-store table.

  • Number of columns

    If a table is to contain many columns, use a column-store table.

  • Number of columns to be queried

    If only a small number of columns (less than 50% of the total) is queried each time, use a column-store table.

  • Compression ratio

    The compression ratio of a column-store table is higher than that of a row-store table. High compression ratio consumes more CPU resources.