pgl_ddl_deploy

Introduction

There are many databases that require replicating data to other databases for various purposes. One of the most useful database technologies that is used to move data from point A to point B is called "logical replication". In database jargon, there are two categories of SQL statements: DML and DDL. For a number of reasons, DDL has to be handled separately. During the migration, the DBA is required to manually deploy the SQL in the correct order for all involved database clusters, manage locking contention, and add new tables to replication if necessary. Built on top of pglogical, pgl_ddl_deploy enables any DDL SQL statement to be directly propagated to subscribers. This solves the problem that pglogical cannot synchronize DDL statements.

For more information, see official pgl_ddl_deploy documentation.

Supported Versions

This extension is available to the latest minor versions of RDS for PostgreSQL 12 and later versions. You can run the following SQL statement to check whether your DB instance supports this extension:

SELECT * FROM pg_available_extension_versions WHERE name = 'pgl_ddl_deploy';

If this extension is not supported, upgrade the minor version of your DB instance or upgrade the major version using dump and restore.

For details about the extensions supported by RDS for PostgreSQL, see Supported Extensions.

Features

RDS for PostgreSQL supports the pgl_ddl_deploy extension, which is used to automatically synchronize DDL statements. In many cases, most DDL statements executed in application environments can be synchronized.

Any DDL statement can be synchronized to subscribers.
Tables can be automatically added to replication upon creation.
Filtering by regular expression or a specific set of tables is supported.
There is an option to deploy in a lock-safe way on subscribers.
There is an option to fail certain events on the subscriber to be retried later.
In some edge cases, alerting can be built around provided logging for the DBA to then handle possible manual deployments.
ALTER TABLE statements can be filtered by subcommand tags.
Support for automatically killing blocking processes that are preventing DDL execution on the subscriber system is optional.

Extension Installation and Uninstallation

Installing the extension

SELECT control_extension ('create', 'pgl_ddl_deploy');

Deleting the extension

SELECT control_extension ('drop', 'pgl_ddl_deploy');

For more information, see Installing and Uninstalling an Extension on the RDS Console and Installing and Uninstalling an Extension Using SQL Commands.

Basic Usage

This extension involves publication and subscription and depends on pglogical. You need to add and configure parameters.

wal_level = 'logical'
shared_preload_libraries = 'pglogical'

For details about how to modify the shared_preload_libraries parameter, see Modifying the shared_preload_libraries Parameter.

-- Configuring parameters on the provider
SELECT control_extension ('create', 'pglogical');
SELECT control_extension ('create', 'pgl_ddl_deploy');
CREATE TABLE foo (id INT PRIMARY KEY);
-- Creating a publication
CREATE PUBLICATION testpub FOR TABLE foo;
-- Configuring a replication set
INSERT INTO pgl_ddl_deploy.set_configs (set_name, include_schema_regex, driver) VALUES ('testpub', '.*', 'native'::pgl_ddl_deploy.driver);
-- Deploying the publication
SELECT pgl_ddl_deploy.deploy('testpub');
-- Adding roles for the user
SELECT pgl_ddl_deploy.add_role(oid) FROM pg_roles WHERE rolname='root';
-- Configuring parameters on the subscriber
SELECT control_extension ('create', 'pglogical');
SELECT control_extension ('create', 'pgl_ddl_deploy');
CREATE TABLE foo (id INT PRIMARY KEY);
-- Creating a subscription
CREATE SUBSCRIPTION testsub CONNECTION conninfo PUBLICATION testpub;
ALTER SUBSCRIPTION testsub REFRESH PUBLICATION;

After the configuration is complete, run the following DDL statements on the provider:

ALTET TABLE foo ADD COLUMN bla INT;
CREATE TABLE bra (id INT PRIMARY KEY);

You can verify the following on the subscriber:

\d foo
                Table "public.foo"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 id     | integer |           | not null |
 bla    | integer |           |          |
Indexes:
    "foo_pkey" PRIMARY KEY, btree (id)
\dt
          List of relations
 Schema |  Name   | Type  |  Owner
--------+---------+-------+----------
 public | bar     | table | root
 public | foo     | table | root
(2 rows)

Restrictions

This extension has some limitations. Although most DDL statements executed in application environments can be synchronized, it does not cover 100% of edge cases.

DDL Involving Multiple Tables

A single DDL SQL statement which alters both replicated and non-replicated tables cannot be supported. For example, if you set the include_schema_regex parameter to '^replicated.*':

DROP TABLE replicated.foo, notreplicated.bar;

The following message will be displayed on the provider:

WARNING: Unhandled deployment logged in pgl_ddl_deploy.unhandled
DROP TABLE

The replicated.foo table exists on the subscriber.

\d replicated.foo
              Table "replicated.foo"
Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 id     | integer |           | not null |
Indexes:
    "foo_pkey" PRIMARY KEY, btree (id)

Similarly, if filtered replication is used, an error may occur when you run the following statement:

ALTER TABLE replicated.foo ADD COLUMN bar_id INT REFERENCES notreplicated.bar (id);

The statement is not synchronized to the subscriber.

\d replicated.foo
              Table "replicated.foo"
Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
id     | integer |           | not null |
Indexes:
    "foo_pkey" PRIMARY KEY, btree (id)

Unsupported Commands

CREATE TABLE AS and SELECT INTO are not supported to replicate DDL due to limitations on transactional consistency. That is, if a table is created from a set of data on the provider, running the same SQL on the subscriber will in no way guarantee consistent data. For example:

CREATE TABLE foo AS
SELECT field_1, field_2, now() AS refreshed_at
FROM table_1;

Similar to CREATE TABLE AS, the following message will be displayed for SELECT INTO:

WARNING:  Unhandled deployment logged in pgl_ddl_deploy.unhandled

Multi-Statement Client SQL Limitations

The complexities and limitations come when the client sends all SQL statements as one single string to PostgreSQL. Assume the following SQL statements:

CREATE TABLE foo (id serial primary key, bla text);
INSERT INTO foo (bla) VALUES ('hello world');

If this was in a file that was called using psql, it would run as two separate SQL command strings. However, if in Python or Ruby's ActiveRecord you create a single string as above and execute it, then it would be sent to PostgreSQL as one single SQL command string. The replication depends on the value of the allow_multi_statements parameter:

If the value is false, pgl_ddl_deploy will only auto-replicate a client SQL statement containing one command tag that matches the event trigger command tag. That is really safe, but it means you may have a lot more unhandled deployments.
If the value is true, pgl_ddl_deploy will only auto-replicate DDL that contains safe command tags to propagate. For example, mixed DDL and DML is forbidden. If a command contains more than two DDL statements and the statements are used on both replicated and non-replicated tables, the problem described in DDL Involving Multiple Tables occurs.

In any case that a SQL statement cannot be automatically run on the subscriber based on these analyses, instead it will be logged as a WARNING and put into the unhandled table for manual processing. For more details and solutions to problems that occur during replication, see official pgl_ddl_deploy documentation.

Parent Topic: Extension Management

Previous topic: pg_repack

Next topic: pgvector