Best Practices of Database SEQUENCE

A sequence is a database object that generates unique integers. A sequence's value automatically adjusts according to certain rules. Typically, sequences serve as primary keys. In GaussDB(DWS), when a sequence is created, a metadata table with the same name is created to record sequence information. For example:

   
        CREATE SEQUENCE seq_test; 
CREATE SEQUENCE

SELECT * FROM seq_test;
 sequence_name | last_value | start_value | increment_by |      max_value      | min_value | cache_value | log_cnt | is_cycled | is_called |  uuid
---------------+------------+-------------+--------------+---------------------+-----------+-------------+---------+-----------+-----------+---------
 seq_test      |         -1 |           1 |            1 | 9223372036854775807 |         1 |           1 |       0 | f         | f         | 1400050
(1 row)

In the preceding information:

sequence_name indicates the name of the sequence.
last_value is meaningless.
start_value indicates the initial value of the sequence.
increment_by indicates the step of the sequence.
max_value indicates the maximum value of the sequence.
min_value indicates the minimum sequence value.
cache_value determines how many sequence values are preloaded for rapid access to subsequent values. (After this cache is set, the continuity of sequence values cannot be ensured, and unacknowledged sequences may be generated, causing waste of sequences.)
log_cnt indicates the number of sequence values recorded in WAL logs. In GaussDB(DWS), sequences are obtained and managed from GTM. Therefore, log_cnt is meaningless.
is_cycled indicates whether to continue the loop after the sequence reaches the minimum or maximum value.
is_called indicates whether the sequence has been called. (It only indicates whether the sequence has been called on the current instance. For example, after the sequence is called on cn1, the value of the filed on cn1 changes to t, and the value of the field on cn2 is still f.)
uuid indicates the unique ID of the sequence.

Creating a Sequence

In GaussDB(DWS), the Global Transaction Manager (GTM) generates and maintains the global unique information about a transaction, such as the global transaction ID, transaction snapshot, and sequence. The following figure shows the process of creating a sequence in GaussDB(DWS).

Figure 1 Creating a sequence

The specific process is as follows:

The CN that receives the SQL command applies for a UUID from the GTM.
The GTM returns a UUID.
The CN binds the obtained UUID to the sequenceName created by the user.
The CN delivers the binding relationship to other nodes, and other nodes create the sequence metadata table synchronously.
The CN sends the UUID and startID of the sequence to the GTM for permanent storage.

Therefore, sequence maintenance and request are actually completed on the GTM. When requesting nextval, each instance obtains a sequence value from the GTM using the sequence's UUID. The number of values requested correlates with the cache size. An instance will only request a new sequence value from the GTM once its cache is depleted. Thus, enlarging the sequence's cache minimizes the communication frequency between the CN/DN and the GTM.

Two Methods of Creating a Sequence

Method 1: Run the CREATE SEQUENCE statement to create a sequence and use nextval to invoke the sequence in the new table.

    
         CREATE SEQUENCE seq_test increment by 1 minvalue 1 no maxvalue start with 1;
CREATE SEQUENCE

CREATE TABLE table_1(id int not null default nextval('seq_test'), name text);
CREATE TABLE

Method 2: If the serial type is used during table creation, a sequence is automatically created and the default value of the column is set to nextval.

    
         CREATE TABLE mytable(a int, b serial) distribute by hash(a);
NOTICE:  CREATE TABLE will create implicit sequence "mytable_b_seq" for serial column "mytable.b"
CREATE TABLE

 \d+ mytable
                                            Table "dbadmin.mytable"
 Column |  Type   |                      Modifiers                      | Storage | Stats target | Description
--------+---------+-----------------------------------------------------+---------+--------------+-------------
 a      | integer |                                                     | plain   |              |
 b      | integer | not null default nextval('mytable_b_seq'::regclass) | plain   |              |
Has OIDs: no
Distribute By: HASH(a)
Location Nodes: ALL DATANODES
Options: orientation=row, compression=no

In this example, a sequence named mytable_b_seq is automatically created. Technically speaking, the serial type is not an actual data type but rather a method for assigning a unique identifier to a table column. Creating a serial involves generating a linked sequence for that specific column.

It is equivalent to the following statements:

   
        CREATE TABLE mytable01(a int, b int) distribute by hash(a);
CREATE TABLE

CREATE SEQUENCE mytable01_b_seq owned by mytable.b;
CREATE SEQUENCE

ALTER SEQUENCE mytable01_b_seq owner to u1; --u1 is the owner of the mytable01 table. The owner does not need to run this statement.
ALTER SEQUENCE

ALTER TABLE mytable01 alter b set default nextval('mytable01_b_seq'), alter b set not null;
ALTER TABLE

\d+ mytable01
                                            Table "dbadmin.mytable01"
 Column |  Type   |                       Modifiers                       | Storage | Stats target | Description
--------+---------+-------------------------------------------------------+---------+--------------+-------------
 a      | integer |                                                       | plain   |              |
 b      | integer | not null default nextval('mytable01_b_seq'::regclass) | plain   |              |
Has OIDs: no
Distribute By: HASH(a)
Location Nodes: ALL DATANODES
Options: orientation=row, compression=no

Common Usage of Sequences in Services

Sequences are commonly used to generate primary keys or unique columns during data import, a frequent practice in data migration scenarios. Different migration tools or service import scenarios use different import methods. Common import methods are classified into copy and insert. For sequences, the processing in the two scenarios is slightly different.

Scenario 1: Insert pushdown

      
       
         
         
           CREATE TABLE test1(a int, b serial) distribute by hash(a);
NOTICE:  CREATE TABLE will create implicit sequence "test1_b_seq" for serial column "test1.b"
CREATE TABLE

CREATE TABLE test2(a int) distribute by hash(a);
CREATE TABLE

EXPLAIN VERBOSE INSERT INTO test1(a) SELECT a FROM test2;
                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
  id |              operation              | E-rows | E-distinct | E-memory | E-width | E-costs
 ----+-------------------------------------+--------+------------+----------+---------+---------
   1 | ->  Streaming (type: GATHER)        |      1 |            |          |       4 | 16.34
   2 |    ->  Insert on dbadmin.test1      |     30 |            |          |       4 | 16.22
   3 |       ->  Seq Scan on dbadmin.test2 |     30 |            | 1MB      |       4 | 14.21

             RunTime Analyze Information
 ----------------------------------------------------
         "dbadmin.test2" runtime: 9.586ms, sync stats

      Targetlist Information (identified by plan id)
 ---------------------------------------------------------
   1 --Streaming (type: GATHER)
         Node/s: All datanodes
   3 --Seq Scan on dbadmin.test2
         Output: test2.a, nextval('test1_b_seq'::regclass)
         Distribute Key: test2.a

   ====== Query Summary =====
 -------------------------------
 System available mem: 1351680KB
 Query Max mem: 1351680KB
 Query estimated mem: 1024KB
 Parser runtime: 0.076 ms
 Planner runtime: 12.666 ms
 Unique SQL Id: 831364267
(26 rows)

          

        

      
     

During an INSERT operation, nextval is executed on the DNs. This occurs whether nextval is called with its default value or invoked explicitly. The execution plan confirms that nextval operates at the sequence layer on the DNs. In this scenario, DNs obtain sequence values directly from the GTM and execute the request simultaneously, resulting in a relatively high level of efficiency.

Scenario 2: Copy scenario
In service development, alongside the INSERT method, the COPY method is also for data import into the database. It allows for the direct copying of file contents or using the CopyManager interface for this purpose. Moreover, the CDM data synchronization tool facilitates batch data import by copying. If the target table to be copied uses the default value nextval, the process is as follows.

In the copy process, the CN requests sequence values from the GTM. If the sequence's cache size is too small, the CN must repeatedly connect with the GTM to request nextval, which can lead to a performance bottleneck. Typical Optimization Scenarios Related to Sequences describes the service performance in this scenario and provides optimization methods.

Typical Optimization Scenarios Related to Sequences

Service scenarios:

Service scenario: In a service scenario, the CDM data synchronization tool is used to transfer data and import data from the source to the target GaussDB(DWS). Despite changing the CDM concurrency from 1 to 5, the synchronization rate remains unchanged, and there is a significant difference between the import rate and the expected value. Apart from data copying, all other services run smoothly without any performance or resource issues. Thus, it is likely that a bottleneck exists within the service. You are advised to review the job queue specifically for the COPY operation.

As shown in the preceding figure, five CDM jobs are executed concurrently. You can see five COPY statements in the active view. Check the waiting view based on query_id corresponding to the five COPY statements. Out of the five COPY operations, only one requests a sequence value from the GTM concurrently, while the rests wait for a lightweight lock. As a result, enabling five concurrent jobs does not substantially enhance performance compared to just running a single job.

Causes:

The serial type is used when the target table is created. By default, the cache of the created sequence is 1. As a result, when data is concurrently copied to the database, the CN frequently establishes connections with the GTM, and lightweight lock contention exists between multiple concurrent jobs, resulting in low data synchronization efficiency.

Solutions:

In this scenario, increase the cache value of the sequence to prevent bottlenecks caused by frequent GTM connection establishment. In this service scenario example, about 100,000 data records are synchronized each time. Based on service evaluation, change the cache value to 10,000. (In practice, set a proper cache value based on services to ensure quick access and avoid sequence number waste.)

In cluster versions 8.2.1.100 and later, you can use ALTER SEQUENCE to change the cache value.

GaussDB(DWS) clusters of version 8.2.1 or earlier do not allow for the modification of cache values through ALTER SEQUENCE. To change the cache value of an existing sequence, follow these steps (the mytable table is used as an example):

Remove the association between the current sequence and the target table.

      
           ALTER SEQUENCE mytable_b_seq owned by none;
ALTER TABLE mytable alter b drop default;

Record the current sequence value as the start value of the new sequence.
1

SELECT nextval('mytable_b_seq');
Delete the sequence.
1

DROP SEQUENCE mytable_b_seq;

Create a sequence and bind it to the target table. Replace xxx with the value of nextval obtained in the previous step.

      
           CREATE SEQUENCE mytable_b_seq START with xxx cache 10000 owned by mytable.b;
ALTER SEQUENCE mytable_b_seq owner to u1;--u1 is the owner of the mytable table. The owner does not need to run this statement.
ALTER TABLE mytable alter b set default nextval('mytable_b_seq');