Best Practices of Database SEQUENCE

A sequence, also called a sequence, is a database object used to generate a unique integer. The value of a sequence increases or decreases automatically based on certain rules. Generally, a sequence is used as a primary key. In GaussDB (DWS), when a sequence is created, a metadata table with the same name is created to record sequence information. For example:

   
        CREATE SEQUENCE seq_test; 
CREATE SEQUENCE

SELECT * FROM seq_test;
 sequence_name | last_value | start_value | increment_by |      max_value      | min_value | cache_value | log_cnt | is_cycled | is_called |  uuid
---------------+------------+-------------+--------------+---------------------+-----------+-------------+---------+-----------+-----------+---------
 seq_test      |         -1 |           1 |            1 | 9223372036854775807 |         1 |           1 |       0 | f         | f         | 1400050
(1 row)

In the preceding command:

sequence_name indicates the name of a sequence.
last_value is meaningless.
start_value indicates the initial value of the sequence.
increment_by indicates the step of the sequence.
max_value indicates the maximum value of a sequence.
min_value indicates the minimum sequence value.
cache_value indicates the number of sequence values that are pre-stored to quickly obtain the next sequence value. (After the cache is defined, the continuity of sequence values cannot be ensured, holes are generated, and sequence number segments are wasted.)
log_cnt indicates the number of sequence values recorded in WAL logs. In GaussDB (DWS), sequence values are obtained and managed from GTM. Therefore, log_cnt is meaningless.
is_cycled indicates whether to continue the loop after the sequence reaches the minimum or maximum value.
is_called indicates whether the sequence has been invoked. (It only indicates whether the sequence has been invoked on the current instance. For example, after the sequence is invoked on cn1, the value of the original data table on cn1 changes to t, and the value of the field on cn2 is still f.)
uuid indicates the unique ID of the sequence.

Process of Creating a Sequence

In GaussDB (DWS), the Global Transaction Manager (GTM) generates and maintains globally unique information, such as global transaction IDs, transaction snapshots, and sequences. The following figure shows the process of creating a sequence in GaussDB (DWS).

Figure 1 Process of Creating a Sequence

The specific process is as follows:

The CN that accepts the SQL command applies for a UUID from the GTM.
The GTM returns a UUID.
The CN binds the obtained UUID to the sequenceName created by the user.
The CN delivers the binding relationship to other nodes, and other nodes create the sequence metadata table synchronously.
The CN sends the UUID and startID of the sequence to the GTM for permanent storage.

Therefore, sequence maintenance and application are actually completed on the GTM. When applying for nextval, each instance that invokes nextval applies for a sequence value from the GTM based on the UUID of the sequence. The sequence value range applied for each time is related to the cache. The instance applies for a sequence value from the GTM only after the cache is used up. Therefore, increasing the cache of the sequence helps reduce the number of times that the CN/DN communicates with the GTM.

Two Methods of Creating a Sequence

Method 1: Run the CREATE SEQUENCE statement to create a sequence and use nextval to invoke the sequence in the new table.

    
         CREATE SEQUENCE seq_test increment by 1 minvalue 1 no maxvalue start with 1;
CREATE SEQUENCE

CREATE TABLE table_1(id int not null default nextval('seq_test'), name text);
CREATE TABLE

Method 2: If the serial type is used during table creation, a sequence is automatically created and the default value of the column is set to nextval.

    
         CREATE TABLE mytable(a int, b serial) distribute by hash(a);
NOTICE:  CREATE TABLE will create implicit sequence "mytable_b_seq" for serial column "mytable.b"
CREATE TABLE

 \d+ mytable
                                            Table "dbadmin.mytable"
 Column |  Type   |                      Modifiers                      | Storage | Stats target | Description
--------+---------+-----------------------------------------------------+---------+--------------+-------------
 a      | integer |                                                     | plain   |              |
 b      | integer | not null default nextval('mytable_b_seq'::regclass) | plain   |              |
Has OIDs: no
Distribute By: HASH(a)
Location Nodes: ALL DATANODES
Options: orientation=row, compression=no

In this example, a sequence named mytable_b_seq is automatically created. Strictly speaking, the serial type is not a real type. It is only a concept for setting a unique identifier in a table. When a serial type is created, a sequence is created and associated with the column.

It is equivalent to the following statement:

   
        CREATE TABLE mytable01(a int, b int) distribute by hash(a);
CREATE TABLE

CREATE SEQUENCE mytable01_b_seq owned by mytable.b;
CREATE SEQUENCE

ALTER SEQUENCE mytable01_b_seq owner to u1; --u1 is the owner of the mytable01 table. If the current user is the owner, you do not need to run this statement.
ALTER SEQUENCE

ALTER TABLE mytable01 alter b set default nextval('mytable01_b_seq'), alter b set not null;
ALTER TABLE

\d+ mytable01
                                            Table "dbadmin.mytable01"
 Column |  Type   |                       Modifiers                       | Storage | Stats target | Description
--------+---------+-------------------------------------------------------+---------+--------------+-------------
 a      | integer |                                                       | plain   |              |
 b      | integer | not null default nextval('mytable01_b_seq'::regclass) | plain   |              |
Has OIDs: no
Distribute By: HASH(a)
Location Nodes: ALL DATANODES
Options: orientation=row, compression=no

Common Usage of Sequences in Services

Sequences are often used to generate primary keys or unique columns during data import in data migration scenarios. Different migration tools or service import scenarios use different import methods. Common import methods are classified into copy and insert. For seqeunce, the processing in the two scenarios is slightly different.

Scenario 1: Insert Pushdown

      
       
         
         
           CREATE TABLE test1(a int, b serial) distribute by hash(a);
NOTICE:  CREATE TABLE will create implicit sequence "test1_b_seq" for serial column "test1.b"
CREATE TABLE

CREATE TABLE test2(a int) distribute by hash(a);
CREATE TABLE

EXPLAIN VERBOSE INSERT INTO test1(a) SELECT a FROM test2;
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
  id |              operation              | E-rows | E-distinct | E-memory | E-width | E-costs
 ----+-------------------------------------+--------+------------+----------+---------+---------
   1 | ->  Streaming (type: GATHER)        |      1 |            |          |       4 | 16.34
   2 |    ->  Insert on dbadmin.test1      |     30 |            |          |       4 | 16.22
   3 |       ->  Seq Scan on dbadmin.test2 |     30 |            | 1MB      |       4 | 14.21

             RunTime Analyze Information
 ----------------------------------------------------
         "dbadmin.test2" runtime: 9.586ms, sync stats

      Targetlist Information (identified by plan id)
 ---------------------------------------------------------
   1 --Streaming (type: GATHER)
         Node/s: All datanodes
   3 --Seq Scan on dbadmin.test2
         Output: test2.a, nextval('test1_b_seq'::regclass)
         Distribute Key: test2.a

   ====== Query Summary =====
 -------------------------------
 System available mem: 1351680KB
 Query Max mem: 1351680KB
 Query estimated mem: 1024KB
 Parser runtime: 0.076 ms
 Planner runtime: 12.666 ms
 Unique SQL Id: 831364267
(26 rows)

          

        

      
     

In the INSERT scenario, nextval can be pushed down to DNs for execution. Therefore, nextval is pushed down to DNs for execution regardless of whether nextval with the default value is used or nextval is explicitly invoked. The execution plan in the preceding example also shows that nextval is pushed down to DNs for execution, the invoking of nextval is at the sequence layer, indicating that nextval is executed on DNs. In this case, DNs directly apply for sequence values from the GTM, and DNs execute the application concurrently. Therefore, the efficiency is relatively high.

Scenario 2: Copy Scenario
During service development, in addition to INSERT, COPY can be used to import data to the database. This method is used to copy file content to the database or use the CopyManager interface to import file content to the database. In addition, the CDM data synchronization tool imports data to the database in batches by copying data. If the target table to be copied uses the default value nextval, the process is as follows:

In the copy scenario, the CN applies for sequence values from the GTM. Therefore, when the cache value of sequence is small, the CN frequently establishes connections with the GTM and applies for nextval, causing a performance bottleneck. The Typical Optimization Scenarios Related to Sequences describes the service performance in this scenario and provides optimization methods.

Typical Optimization Scenarios Related to Sequences

Service scenario: In a service scenario, the CDM data synchronization tool is used to migrate data and import data from the source end to the target GaussDB (DWS). The import rate differs greatly from the empirical value. After the CDM concurrency is changed from 1 to 5, the synchronization rate still cannot be improved. Check the statement execution status. Except COPY, other services are executed properly without performance bottlenecks or resource bottlenecks. Therefore, it is preliminarily determined that the service has a bottleneck. Check the job waiting view related to COPY.

As shown in the preceding figure, five CDM jobs are executed concurrently. Therefore, you can see five COPY statements in the active view. Check the waiting view based on query_id corresponding to the five COPY statements. Among the five copies, only one copy is applying for a sequence value from the GTM at the same time, and other copies are waiting for a lightweight lock. Therefore, even if five concurrent jobs are enabled, the actual effect is not significantly improved compared with that of one concurrent job.

Cause: The serial type is used when the target table is created. By default, the cache of the created sequence is 1. As a result, when data is concurrently copied to the database, the CN frequently establishes connections with the GTM, and lightweight lock contention exists between multiple concurrent tasks, resulting in low data synchronization efficiency.

Solution: In this scenario, increase the cache value of the sequence to prevent bottlenecks caused by frequent GTM connection establishment. In this service scenario example, about 100,000 data records are synchronized each time. Based on service evaluation, change the cache value to 10000. (In practice, set a proper cache value based on services to ensure quick access and avoid sequence number waste.)

In cluster versions 8.2.1.100 and later, you can use ALTER SEQUENCE to change the cache value.

In clusters of 8.2.1 and earlier versions, the cache value of GaussDB (DWS) cannot be changed using ALTER SEQUENCE. You can change the cache value of an existing sequence as follows (the mytable table is used as an example):

Remove the association between the current sequence and the target table.

      
           ALTER SEQUENCE mytable_b_seq owned by none;
ALTER TABLE mytable alter b drop default;

Record the current sequence number as the start value of the new sequence.
1

SELECT nextval('mytable_b_seq');
Delete a sequence.
1

DROP SEQUENCE mytable_b_seq;

Create seqeunce and bind it to the target table. Replace xxx with the value of nextval obtained in the previous step.

      
           CREATE SEQUENCE mytable_b_seq START with xxx cache 10000 owned by mytable.b;
ALTER SEQUENCE mytable_b_seq owner to u1;--u1 is the owner of the mytable table. If the current user is the owner, you do not need to run this statement.
ALTER TABLE mytable alter b set default nextval('mytable_b_seq');