Help Center/ GaussDB/ Centralized_8.x/ Logical Replication/ Logical Decoding/ Logical Decoding of DDL Statements
Updated on 2024-06-07 GMT+08:00

Logical Decoding of DDL Statements

DDL statements can be properly executed on a GaussDB host and can be obtained using a logical decoding tool.

Table 1 Supported DDL statements

Table

Index

User-defined Function

User-defined Stored Procedure

Trigger

Sequence

View

Materialized View

Package

Schema

CREATE TABLE [PARTITION]

ALTER TABLE

DROP TABLE

CREATE INDEX

ALTER INDEX

DROP INDEX

REINDEX

CREATE FUNCTION

ALTER FUNCTION

DROP FUNCTION

CREATE PROCEDURE

ALTER PROCEDURE

DROP PROCEDURE

CREATE TRIGGER

ALTER TRIGGER

DROP TRIGGER

CREATE SEQUENCE

ALTER SEQUENCE

DROP SEQUENCE

CREATE VIEW

ALTER VIEW

DROP VIEW

CREATE MATERIALIZED VIEW

ALTER MATERIALIZED VIEW

DROP MATERIALIZED VIEW

CREATE PACKAGE

ALTER PACKAGE

DROP PACKAGE

CREATE SCHEMA

ALTER SCHEMA

DROP SCHEMA

Description

When DML statements are executed in the database, the storage engine generates DML logs for restoration. After decoding the DML logs, the storage engine restores the corresponding DML statements and generates logical logs. For DDL statements, the database does not record logs of original DDL statements. Instead, it records DML logs of system catalogs involved in DDL statements. DDL statements are of various types and complex syntax. It is difficult to support DDL statements in logical replication and decode original DDL statements based on DML logs of these system catalogs. DDL logs are added to record original DDL information. During decoding, original DDL statements can be obtained from DDL logs.

During the execution of a DDL statement, the SQL engine parser parses the syntax and lexicon of the original statement and generates a parsing tree. (Different DDL syntaxes generate different types of parsing trees, which contain all information about the DDL statement.) Then, the executor performs the corresponding operation based on the information to generate and modify the corresponding meta information.

In this document, DDL logs are added to support logical decoding of DDL statements. The content of DDL logs is generated based on the parser result (parsing tree) and executor result, and the logs are generated after the execution is complete.

DDL statements are reversely parsed from the syntax tree. In this way, DDL commands are converted to JSON statements and necessary information is provided to rebuild DDL commands in the target location. Compared with the original DDL command strings, the benefits of parsing DDL statements from the syntax tree include:

  1. Each parsed database object has a schema. Therefore, if different search paths are used, there is no ambiguity.
  2. Structured JSON statements and formatted output support heterogeneous databases. If you are using different database versions and some DDL syntaxes differ, you need to resolve these differences before applying them.

The output result of reverse parsing is in the normalized format. This result is equivalent to the user input but is not necessarily the same. For example:

Example 1: If the function body does not contain single quotation marks ('), the separator $$ of the function body is parsed as single quotation marks (').

Original SQL statement:

CREATE FUNCTION func(a INT) RETURNS INT AS
$$
BEGIN
a:= a+1;
CREATE TABLE test(col1 INT);
INSERT INTO test VALUES(1);
DROP TABLE test;
RETURN a;
END;
$$
LANGUAGE plpgsql;

Reverse parsing result:

CREATE  FUNCTION public.func ( IN a pg_catalog.int4 ) RETURNS pg_catalog.int4 LANGUAGE plpgsql VOLATILE CALLED ON NULL INPUT SECURITY INVOKER COST 100 AS '
BEGIN
a:= a+1;
CREATE TABLE test(col1 INT);
INSERT INTO test VALUES(1);
DROP TABLE test;
RETURN a;
END;
';

Example 2: "CREATE MATERIALIZED VIEW v46_4 AS SELECT a, b FROM t46 ORDER BY a OFFSET 10 ROWS FETCH NEXT 3 ROWS ONLY" will be reversely parsed as "CREATE MATERIALIZED VIEW public.v46_4 AS SELECT a, b FROM public.t46 ORDER BY a OFFSET 10 LIMIT 3".

Example 3: "ALTER INDEX "Alter_Index_Index" REBUILD PARTITION "CA_ADDRESS_SK_index2"" will be reversely parsed as "REINDEX INDEX public."Alter_Index_Index" PARTITION "CA_ADDRESS_SK_index2"".

Example 4: Create or modify a range partitioned table. The START END syntax is decoded and converted into the LESS THAN statement.
gaussdb=# CREATE TABLE test_create_table_partition2 (c1 INT, c2 INT) 
PARTITION BY RANGE (c2) (
    PARTITION p1 START(1) END(1000) EVERY(200) ,
    PARTITION p2 END(2000),
    PARTITION p3 START(2000) END(2500),
    PARTITION p4 START(2500),
    PARTITION p5 START(3000) END(5000) EVERY(1000) 
);

Reverse parsing result:

gaussdb=# CREATE TABLE test_create_table_partition2 (c1 INT, c2 INT) 
PARTITION BY RANGE (c2) (
    PARTITION p1_0 VALUES LESS THAN ('1'), PARTITION p1_1 VALUES LESS THAN ('201'), PARTITION p1_2 VALUES LESS THAN ('401'), PARTITION p1_3 VALUES LESS THAN ('601'), PARTITION p1_4 VALUES LESS THAN ('801'), PARTITION p1_5 VALUES LESS    THAN ('1000'), 
    PARTITION p2 VALUES LESS THAN ('2000'), 
    PARTITION p3 VALUES LESS THAN ('2500'), 
    PARTITION p4 VALUES LESS THAN ('3000'), 
    PARTITION p5_1 VALUES LESS THAN ('4000'), 
    PARTITION p5_2 VALUES LESS THAN ('5000')
);
Example 5: When adding a column to a table, use IF NOT EXISTS for judgment.
  • Original SQL statement:
    gaussdb=# CREATE TABLE IF NOT EXISTS tb5 (c1 int,c2 int) with (ORIENTATION=ROW, STORAGE_TYPE=USTORE);
    gaussdb=# ALTER TABLE IF EXISTS tb5 * ADD COLUMN IF NOT EXISTS c2 char(5) after c1; -- Can be decoded. Column c2 of the int type exists in the table and is skipped, and the type of column c2 in the reverse parsing result remains unchanged.

    Reverse parsing result:

    gaussdb=# ALTER TABLE IF EXISTS public.tb5 ADD COLUMN IF NOT EXISTS c2 pg_catalog.int4 AFTER c1;
  • Original SQL statement:
    gaussdb=# ALTER TABLE IF EXISTS tb5 * ADD COLUMN IF NOT EXISTS c2 char(5) after c1,  ADD COLUMN IF NOT EXISTS c3 char(5) after c1; -- Decoded. The type of the new column c3 in the reverse parsing result is correct.

    Reverse parsing result:

    gaussdb=# ALTER TABLE IF EXISTS public.tb5 ADD COLUMN IF NOT EXISTS c2 pg_catalog.int4 AFTER c1, ADD COLUMN IF NOT EXISTS c3 pg_catalog.bpchar(5) AFTER c1;
  • Original SQL statement:
    gaussdb=# ALTER TABLE IF EXISTS tb5 * ADD COLUMN c2 char(5) after c1,  ADD COLUMN IF NOT EXISTS c4 int after c1; -- Not decoded. An error occurs when the statement is executed.

Specification Restrictions

  • DDL specification restrictions for logical decoding:
    • When there is no DDL statement, the logical decoding performance in the DML-only scenario does not deteriorate.
    • The logical decoding performance in the DDL-only scenario is about 100 MB/s in the standard environment, and that in DDL/DML hybrid transaction scenario is about 100 MB/s in the standard environment.
    • After this function is enabled, the performance decrease of DDL statements is less than 15%.
  • General decoding restrictions (serial and parallel):
    • DDL statement decoding cannot be performed on local temporary objects, such as LOCAL temporary tables and temporary schemas.
    • The DEFAULT of ALTER TABLE ADD COLUMN does not support stable or volatile functions. The CHECK constraint expression of CREATE TABLE and ALTER TABLE regarding columns does not support stable or volatile functions. If ALTER TABLE has multiple clauses and one of them has the preceding two situations, the entire ALTER TABLE statement is not parsed reversely.

      gaussdb=# ALTER TABLE tbl_28 ADD COLUMN b1 TIMESTAMP DEFAULT NOW(); -- 's' NOT DEPARSE
      gaussdb=# ALTER TABLE tbl_28 ADD COLUMN b2 INT DEFAULT RANDOM(); -- 'v' NOT DEPARSE
      gaussdb=# ALTER TABLE tbl_28 ADD COLUMN b3 INT DEFAULT ABS(1); -- 'i' DEPARSE
    • DDL statement decoding does not support CREATE TABLE LIKE.

      For tables created by CREATE TABLE LIKE, the ALTER and DROP statements are still decoded.

    • If IF NOT EXISTS exists in the statement for creating an object and the object already exists, the statement is not decoded. If IF EXISTS exists in the statement for deleting an object but the object does not exist, the statement is not decoded.
    • The ALTER PACKAGE COMPILE statement is not decoded, but the DDL/DML statements contained in the instantiated content are decoded. If the package does not contain instantiated content involving DDL or DML statements, ALTER PACKAGE COMPILE will be ignored by logical decoding.
    • Only the commercial DDL syntax earlier than this version is supported. The following SQL statements do not support logical decoding:
      • Create a table in B-compatible mode, and add the ON UPDATE event to a column.

        Original SQL statement:

        -- Create a table in B-compatible mode, and add the ON UPDATE event to a column.
        gaussdb=# CREATE TABLE IF NOT EXISTS tb1 (c1 time without time zone ON UPDATE CURRENT_TIMESTAMP) with (ORIENTATION=ROW, STORAGE_TYPE=USTORE);
        -- Modify a table in B-compatible mode, and add the ON UPDATE event to a column.
        gaussdb=# ALTER TABLE IF EXISTS ONLY tb2 MODIFY COLUMN c2 time without time zone ON UPDATE LOCALTIMESTAMP;

        Reverse parsing result:

        gaussdb=# CREATE TABLE IF NOT EXISTS public.tb1 (c1 TIME) WITH (orientation = 'row', storage_type = 'ustore', compression = 'no') NOCOMPRESS;
        gaussdb=# ALTER TABLE IF EXISTS ONLY public.tb2 MODIFY COLUMN c2 TIME;
      • Create a row-store table and set the ILM policy.

        Original SQL statement:

        gaussdb=# CREATE TABLE IF NOT EXISTS tb3 (c1 int) with (storage_type=USTORE,ORIENTATION=ROW) ILM ADD POLICY ROW STORE COMPRESS ADVANCED ROW AFTER 7 day OF NO MODIFICATION;

        Reverse parsing result:

        gaussdb=# CREATE TABLE IF NOT EXISTS public.tb3 (c1 pg_catalog.int4) WITH (storage_type = 'ustore', orientation = 'row', compression = 'no') NOCOMPRESS;
      • When creating a table, add the IDENTITY constraint to a column.
        CREATE TABLE IF NOT EXISTS tb4 (c1 int GENERATED ALWAYS AS IDENTITY (INCREMENT BY 2 MINVALUE 10 MAXVALUE 20 CYCLE SCALE)); -- The entire statement is not parsed.
      • When creating or modifying a table in B-compatible mode, add comments to the table or columns.

        Original SQL statement:

        gaussdb=# CREATE TABLE IF NOT EXISTS tb6 (c1 integer comment 'MySQL compatible comment syntax') with (ORIENTATION=ROW, STORAGE_TYPE=USTORE);

        Reverse parsing result:

        gaussdb=# CREATE TABLE IF NOT EXISTS public.tb6 (c1 pg_catalog.int4) WITH (storage_type = 'ustore', orientation = 'row', compression = 'no') NOCOMPRESS;
    • Logical decoding does not support DDL/DCL/DML hybrid transactions. In hybrid transactions, DML statements after DDL statements cannot be decoded.
      -- No reverse parsing is performed. DCL statements are not supported and therefore are not parsed. DML statements after DCL statements are not parsed.
      gaussdb=# BEGIN;
      gaussdb=# GAINT ALL PRIVILEGES to u01;
      gaussdb=# INSERT INTO test1(col1) values(1);
      gaussdb=# COMMIT; 
      
      -- Only the first and third SQL statements are reversely parsed.
      gaussdb=# BEGIN;
      gaussdb=# CREATE TABLE mix_tran_t4(id int);
      gaussdb=# INSERT INTO mix_tran_t4 VALUES(111);
      gaussdb=# CREATE TABLE mix_tran_t5(id int);
      gaussdb=# COMMIT; 
      
      -- Only the first and second SQL statements are reversely parsed.
      gaussdb=# BEGIN;
      gaussdb=# INSERT INTO mix_tran_t4 VALUES(111);
      gaussdb=# CREATE TABLE mix_tran_t6(id int);
      gaussdb=# INSERT INTO mix_tran_t4 VALUES(111);
      gaussdb=# COMMIT; 
      
      -- Full reverse parsing
      gaussdb=# BEGIN;
      gaussdb=# INSERT INTO mix_tran_t4 VALUES(111);
      gaussdb=# CREATE TABLE mix_tran_t7(id int);
      gaussdb=# CREATE TABLE mix_tran_t8(id int);
      gaussdb=# COMMIT; 
      
      -- Only the first and third SQL statements are reversely parsed.
      gaussdb=# BEGIN;
      gaussdb=# CREATE TABLE mix_tran_t7(id int);
      gaussdb=# CREATE TYPE compfoo AS (f1 int, f2 text);
      gaussdb=# CREATE TABLE mix_tran_t8(id int);
      gaussdb=# COMMIT; 
      
      -- Full reverse parsing
      gaussdb=# BEGIN;
      gaussdb=# INSERT INTO mix_tran_t4 VALUES(111);
      gaussdb=# INSERT INTO mix_tran_t4 VALUES(111);
      gaussdb=# INSERT INTO mix_tran_t4 VALUES(111);
      gaussdb=# COMMIT; 
      
      -- Only the first SQL statement is reversely parsed.
      gaussdb=# BEGIN;
      gaussdb=# INSERT INTO mix_tran_t4 VALUES(111);
      gaussdb=# CREATE TYPE compfoo AS (f1 int, f2 text);
      gaussdb=# INSERT INTO mix_tran_t4 VALUES(111);
      gaussdb=# COMMIT; 
      
      -- Only the first and third SQL statements are reversely parsed.
      gaussdb=# BEGIN;
      gaussdb=# INSERT INTO mix_tran_t4 VALUES(111);
      gaussdb=# CREATE TYPE compfoo AS (f1 int, f2 text);
      gaussdb=# CREATE TABLE mix_tran_t9(id int);
      gaussdb=# COMMIT; 
    • For CREATE TABLE AS SELECT, SELECT INTO, and CREATE TABLE AS statements, only CREATE TABLE statements can be decoded and INSERT statements cannot be decoded.

      For tables created by using CREATE TABLE AS, the ALTER and DROP statements are still decoded.

      Example:

      Original SQL statement:

      CREATE TABLE IF NOT EXISTS tb35_2 (c1 int) with (storage_type=USTORE,ORIENTATION=ROW);
      INSERT INTO tb35_2 VALUES (6);
      CREATE TABLE tb35_1 with (storage_type=USTORE,ORIENTATION=ROW) AS SELECT * FROM tb35_2;

      Reverse parsing result of the last SQL statement:

       CREATE  TABLE  public.tb35_1 (c1 pg_catalog.int4) WITH (storage_type = 'ustore', orientation = 'row', compression = 'no') NOCOMPRESS;
    • When a stored procedure, function, or advanced package is executed, if the stored procedure, function, or advanced package contains a DDL/DML hybrid transaction or the stored procedure, function, or advanced package and other statements in the same transaction form a DDL/DML hybrid transaction, decoding is performed based on the hybrid transaction principle.
    • The ledger database function is not supported. In the current version, if there are DDL operations related to the ledger database function, for example, CREATE TABLE, the decoded DDL statements contain hash columns. As a result, the database table structures at the target and source ends are inconsistent.
  • DDL-specific restrictions for serial logical decoding:
    • The sql_ decoding plug-in does not support DDL statements in JSON format.

Decoding Format

  • JSON format

    When a DDL statement is input, the SQL engine parser parses the syntax and lexicon of the DDL statement and generates a parsing tree. The parsing tree contains all DDL information and the executor modifies system metadata based on the parsing tree content. After the execution is complete, you can obtain the search path of the DDL objects. After the executor is successfully executed, this feature reversely parses the parsing tree information and executor result to restore all information about the original DDL statement. In this way, the entire DDL statement can be parsed and a DDL statement in JSON format is output to adapt to heterogeneous databases.

    Upon lexical and syntax analysis on the CREATE TABLE statement, the corresponding CreateStmt parsing tree node is obtained, containing table information, column information, distribution information (DistributeBy structure), and partition information (PartitionState structure). After reverse parsing, the result is output in JSON format as follows:

    {"JDDL":{"fmt":"CREATE %{persistence}s TABLE %{if_not_exists}s %{identity}D %{table_elements}s %{with_clause}s %{compression}s","identity":{"object_name":"test_create_table_a","schema_name":"public"},"compression":"NOCOMPRESS","persistence":"","with_clause":{"fmt":"WITH (%{with:, }s)","with":[{"fmt":"%{label}s = %{value}L","label":{"fmt":"%{label}I","label":"orientation"},"value":"row"},{"fmt":"%{label}s = %{value}L","label":{"fmt":"%{label}I","label":"compression"},"value":"no"}]},"if_not_exists":"","table_elements":{"fmt":"(%{elements:, }s)","elements":[{"fmt":"%{name}I %{column_type}T","name":"a","column_type":{"typmod":"","typarray":false,"type_name":"int4","schema_name":"pg_catalog"}}]}}}

    The output JSON string contains the search path of the object. In the string, the identity key indicates that the schema is public and the table name is test_create_table_a. %{persistence}s corresponds to the following field in the SQL statement (This SQL statement does not contain this field and therefore is empty.):

    [ [ GLOBAL | LOCAL ] [ TEMPORARY | TEMP ] | UNLOGGED ]

    %{if_not_exists}s corresponds to a field in the SQL statement. (This SQL statement does not contain this field and therefore the field is empty.)

    [ IF NOT EXISTS ]

    %{identity}D corresponds to the following field in the SQL statement:

    table_name  

    %{table_elements}s corresponds to the following field in the SQL statement:

     (column_name data_type)

    %{with_clause}s corresponds to the following field in the SQL statement:

    [ WITH ( {storage_parameter = value} [, ... ] ) ]

    %{compression}s corresponds to the following field in the SQL statement:

    [ COMPRESS | NOCOMPRESS ]
  • SQL format

    The output in SQL format (text format) is as follows:

    {"TDDL":"CREATE  TABLE  public.test_create_table_a (a pg_catalog.int4) WITH (orientation = 'row', compression = 'no') NOCOMPRESS"}

    The statement also contains the schema name.

Interface Design

  • New control parameters
    1. Logical decoding control parameters are added to control the DDL statement reverse parsing process and output format. You can run the pg_recvlogical -o or pg_logical_slot_peek_changes command to enable them.
      • enable-ddl-decoding: The default value is false, indicating that logical decoding of DDL statements is disabled. The value true indicates that logical decoding of DDL statements is enabled.
      • enable-ddl-json-format: The default value is false, indicating that the DDL statement reverse parsing result is output in text format. The value true indicates that the DDL statement reverse parsing result is output in JSON format.
    2. A GUC parameter is added.
      • enable_logical_replication_ddl: The default value is ON. If the value is ON, logical replication of DDL statements is supported. Otherwise, logical replication of DDL statements is not supported. The DDL statement execution result is reversely parsed and WAL logs of the DDL statements are generated only when this parameter is set to ON. Otherwise, no reverse parsing is performed and no WAL log is generated.

        You can check the operation logs of enable_logical_replication_ddl and determine whether logical decoding of DDL statements is not supported because the parameter is modified by users.

  • New logs

    The xl_logical_ddl_message log is added for DDL statements. The log type is RM_LOGICALDDLMSG_ID. The definition is as follows:

    Name

    Type

    Description

    db_id

    Oid

    Database ID

    rel_id

    Oid

    Table ID

    csn

    CommitSeqNo

    CSN snapshot

    cid

    CommandId

    Command ID

    tag_type

    NodeTag

    DDL type

    message_size

    Size

    Length of the log content

    filter_message_size

    Size

    Length of log information filtered by whitelist

    message

    char *

    DDL content

Procedure

  1. For the logical decoding feature, set the wal_level GUC parameter to logical in advance. This parameter takes effect only after restart.

    gs_guc set -Z datanode -D $node_dir -c "wal_level = logical"

    In the preceding command, $node_dir indicates the database node path. Change it based on the actual situation.

  2. Log in to the primary node of the GaussDB database as a user with the REPLICATION permission and run the following command to connect to the database:

    gsql -U user1 -W password -d db1 -p 16000 -r

    In the preceding command, user1 indicates the username, password indicates the user password, db1 indicates the name of the database to be connected, and 16000 indicates the database port number. You can replace them as required.

  3. Create a logical replication slot named slot1.

    1
    2
    3
    4
    5
    db1=>SELECT * FROM pg_create_logical_replication_slot('slot1', 'mppdb_decoding');
     slotname | xlog_position 
    ----------+---------------
     slot1    | 0/3764C788
    (1 row)
    

    In terms of serial logical decoding, mppdb_ decoding and sql_ decoding are supported; and in terms of parallel logical decoding, only mppdb_ decoding is supported.

  4. Create a package in the database.

    1
    2
    3
    4
    5
    6
    db1=> CREATE OR REPLACE PACKAGE ldp_pkg1 IS
        var1 int:=1;  -- Public variable
        var2 int:=2;
        PROCEDURE testpro1(var3 int);  -- Public stored procedure, which can be called by external systems.
    END ldp_pkg1;
    /
    

  5. Read the decoding result of replication slot 1. You can run the pg_recvlogical -o or pg_logical_slot_peek_changes command to update the replication slot number.

    • For details about the logical decoding options, see Logical Decoding Options and new control parameters.
    • In parallel decoding, you can change the value of the decode_style parameter in pg_recvlogical to determine the decoding format.
      • -o decode_style='b': DDL statements are parsed into the bin format.
      • -o decode_style='j': DDL statements are parsed into the JSON format.
      • -o decode_style='t': DDL statements are parsed into the text format.
    1
    2
    3
    4
    5
    6
    7
    db1=> SELECT data FROM pg_logical_slot_peek_changes('slot1', NULL, NULL, 'enable-ddl-decoding', 'true', 'enable-ddl-json-format', 'false') WHERE data not like 'BEGIN%' AND data not like 'COMMIT%' AND data not like '%dbe_pldeveloper.gs_source%';
    
                                                                                                          data                  
    
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- {"TDDL":"CREATE OR REPLACE PACKAGE public.ldp_pkg1 AUTHID CURRENT_USER IS var1 int:=1; -- Public variable\n    var2 int:=2;\n    
    PROCEDURE testpro1(var3 int);  -- Public stored procedure, which can be called externally.\nEND  ldp_pkg1; \n /"}
    (1 row)
    

  6. Delete the logical replication slot slot1 and package ldp_pkg1.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    db1=> SELECT * FROM pg_drop_replication_slot('slot1');
     pg_drop_replication_slot
    --------------------------
    
    (1 row)
    
    gaussdb=# DROP PACKAGE ldp_pkg1;
    NOTICE:  drop cascades to function public.testpro1(integer)
    DROP PACKAGE