PostgreSQL

DataArts Migration supports RDS for PostgreSQL on the cloud and the on-premises PostgreSQL data source. It is also compatible with GaussDB, Greenplum, and Kingbase. It meets data synchronization requirements in different deployment environments.

Preparation and Constraints

Network requirements
The PostgreSQL data source can communicate with CDM. This ensures smooth data transmission. For details, see Enabling Network Connectivity.
Database connection permissions
- Database connection permissions: The CONNECT permission is required, which allows users to connect to a specified database.
- Network access permissions: In the pg_hba.conf configuration file, enable the IP address of DataArts Migration to access the database.
Table operation permission requirements
- USAGE permission on schemas: To view objects in a schema, you must have the USAGE permission on the schema.
- Reading data from an on-premises PostgreSQL database: The account must have the read-only permission (SELECT) on the table to be synchronized so that data can be read securely and accurately.
- Writing data to an on-premises PostgreSQL database: The account must have write permissions (INSERT, DELETE, and UPDATE) on the table to be synchronized so that data can be correctly written to the destination table.

Driver Selection

Driver Name	How to Obtain	Recommended Version
POSTGRESQL	PostgreSQL driver	42.3.4

Supported Data Types

DataArts Migration supports the following field types and their common variants in PostgreSQL 12 Community Edition. This ensures that DataArts Migration can correctly read and write data.

Category	Field Type	PostgreSQL Read	PostgreSQL Write
Integer	smallint	√	√
	int	√	√
	bigint	√	√
	smallserial	√	√
	serial	√	√
	bigserial	√	√
Floating point number	float	√	√
	DOUBLE PRECISION	√	√
	REAL	√	√
Numeric	decimal(p,s)	√	√
Numeric	NUMERIC	√	√
Character	char	√	√
	varchar	√	√
	text	√	√
Time	date	√	√
	timestamp	√	√
	timestamptz	√	√
	time	√	√
	timez	√	√
	interval	√	√
Binary	BYTEA	√	√
Network	INET	×	×
Currency	money	√	√
Bit	bit	√	√
Bit	varbit	√	√
Boolean	boolean	√	√
Others	int1 (GaussDB)	√	√
	JSONB	√	x
	UUID	x	x

Supported Migration Scenarios

DataArts Migration supports the following offline synchronization modes:

Single table synchronization
DataArts Migration supports table/file synchronization in data ingestion into a data lake or data migration to the cloud.
Database and table shard synchronization
DataArts Migration supports synchronization of data from multiple databases and tables in data ingestion into a data lake or data migration to the cloud.
Entire DB migration
DataArts Migrations supports synchronization of data from an on-premises database in data ingestion into a data lake or data migration to the cloud.

DataArts Migration supports synchronization of data from an on-premises database to the cloud. For details about the supported data source types, see the data source types supported by entire database synchronization.

Database and table shard synchronization and entire DB migration are not supported in all regions. The following table lists the supported PostgreSQL migration scenarios.

Supported Migration Scenario	PostgreSQL Single Table Read	PostgreSQL Single Table Write	PostgreSQL Database/Table Shard Read	PostgreSQL Database/Table Shard Write	PostgreSQL Entire DB Read	PostgreSQL Entire DB Write
Supported	√	√	√ (supported in some regions)	√	√ (supported in some regions)	x

Core Capabilities

Connection configuration

Configuration Item	Supported	Description
User/AK	√	User authentication ensures connection security.
SSL encryption	√	SSL encryption ensures secure data transmission. Currently, SSL authentication can be enabled only for RDS.
SSL authentication	√	Currently, SSL authentication can be enabled only for RDS. The standard Huawei Cloud CA certificate is used for authentication.
Private certificate	x	Private certificates are not supported.
Connection configuration optimization	√	Connection configuration such as connectTimeout can be optimized to improve connection performance.
Custom driver	√	Custom drivers are supported and provide better flexibility.

Read capabilities

Configuration Item	Supported	Description
Shard concurrency	√	Horizontal sharding based on primary keys or common fields and multi-thread concurrent extraction significantly improve the throughput and efficiency.
Dirty data processing	√	Abnormal data can be written to the dirty data bucket to prevent job failures caused by a small amount of abnormal data.
Custom fields	√	You can add computed columns, constant columns, or masking functions for tasks to meet personalized service requirements.
Incremental read	√	Where conditions and the SQL mode enable incremental data reading.
Stream and batch reading	Batch reading	Batch reading improves efficiency when there is a small or medium amount of data.
Optimization of the number of rows read	√	You can set Fetch Size in the connection to properly control the amount of data to be transmitted. This improves performance and prevents a transmission delay or the system from being overloaded when there is a large amount of data.
View read	√	Data can be read from views. This enables flexible data integration and processing.

Write capabilities

Configuration Item	Supported	Description
Data source optimization parameters	√	Optimization parameters such as batchSize and socketTimeout are supported at the source. They improve write performance.
Dirty data processing	√	Abnormal data can be written to the dirty data bucket to prevent job failures caused by a small amount of abnormal data.
Conflict resolution	x	The conflict resolution mechanism is not supported.
Pre- and post-import processing	√	Operations such as preSql and delete can clean and process data before and after data import.
Concurrent write	√	Concurrent write improves efficiency.
Optimization of the number of written rows	x	You can set the number of rows written by each request in the connection to properly control the amount of data to be transmitted. This improves performance and prevents a transmission delay or the system from being overloaded when there is a large amount of data. This function is not supported for this data source.