New Features in 9.1.0.x

The beta features discussed below are not available for commercial use. Search for technical support before utilizing these features.

Version 9.1.0.210 (November 25, 2024)

Storage-compute decoupling

You can use the explain warmup command to preload data into the local disk cache, either at the cold or hot end.
The enhanced elastic VW function offers more flexible ways to distribute services. Services can be distributed to either the primary VW or the elastic VW by CN.
Storage-compute decoupled tables support parallel insert operations, which improves data loading performance.
The storage-compute decoupled table has a recycle bin feature. This allows you to quickly recover from misoperations such as dropping or truncating a table or partition.
Both hot and cold tables can utilize disk cache and asynchronous I/Os to improve performance.

Real-time data warehouse

The performance for limit...offset page turning and inlist operations has been significantly improved.
The Binlog feature is now available for commercial use.
Automatic partitioning now supports time columns of both integer and variable-length types.

Lakehouse

Parquet/ORC read and write now support the zstd compression format.
The create table like command now allows using a table from an external schema as the source table.
Foreign tables can be exported in parallel.

High availability

Storage-compute decoupled tables and hot and cold tables support incremental backup and restoration.
In storage-compute decoupling scenarios, parallel copy is used to increase backup speed.

Ecosystem compatibility

The system is compatible with the replace into syntax of MySQL and the interval time type.
The pg_get_tabledef export function now displays comments.

O&M and stability improvement

When disk usage is high, data can be dumped from the standby node to OBS.
When the database is about to become read-only, certain statements that write to disks and generate new tables and physical files are intercepted to quickly reclaim disk space and ensure the execution of other statements.
Audit logs can be dumped to OBS.
The lightweight lock view pgxc_lwlocks is added.
The common lock view now includes lock acquisition and wait time stamps.
The global deadlock detection function is now enabled by default.
A lock function is added between VACUUM FULL and SELECT.
The expiration time has been added to gs_view_invalid to assist O&M personnel in clearing invalid objects.

Constraints

The maximum number of VWs supported is 256, with each VW supporting a maximum of 1,024 DNs. It is best to have no more than 32 VWs, with each VW containing no more than 128 DNs.
OBS storage-compute decoupled tables do not support DR or fine-grained backup and restoration.

Behavior changes

Enabling the max_process_memory adaptation during the upgrade and using the active/standby mode will increase the available memory of DNs.
By default, data consistency check is enabled for data redistribution during scale-out, which increases the scale-out time by 10%.
Create an Hstore_opt table with the Turbo engine enabled and retain the default value middle for the compression level.
By default, the OBS path of a storage-compute decoupled table is displayed as a relative path.
To use the disk cache, enable the asynchronous I/O parameter.
The interval for clearing indexes of column-store tables has been changed from 1 hour to 10 minutes to quickly clear the occupied index space.
CREATE TABLE and ALTER TABLE do not support columns with the ON UPDATE expression as distribution columns.
During Parquet data query, the timestamp data saved in INT96 format is not adjusted for 8 hours.
max_stream_pool is used to control the number of threads cached in the stream thread pool. The default value is changed from 65525 to 1024 to prevent idle threads from using too much memory.
The track_activity_query_size parameter takes effect upon restart instead of dynamically.
The logical replication function is no longer supported, and an error will be reported when related APIs are called.

Patch 9.1.0.105 (October 23, 2024)

This is a patch version that fixes known issues.

Patch 9.1.0.102 (September 25, 2024)

This is a patch version that fixes known issues.

Upgrade

Upgrade from 9.0.3 to 9.1.0 is supported.

Fixed known issues

Supported alter database xxx rename to yyy in the storage-compute decoupling version.
Fixed the problem of incorrect display of storage-compute decoupling table's \d+ space size.
Fixed the problem of asynchronous sorting not running post backup and restoration.
Fixed the problem of inability to use Create Table Like syntax after deleting the bitmap index column.
Fixed the performance rollback problem in Turbo engine's group by scenario caused by hash algorithm conflicts.
Maintained the scheduler processes' handling of failed tasks in the same manner as version 8.3.0.
Fixed the problem of pg_stat_object space expansion in fault scenario.
Fixed the problem of DataArts Studio reporting an error when delivering a Vacuum Full job after upgrading from 8.3.0 to 9.1.0.
Fixed the problem of high CPU and memory usage during JSON field calculation.

Enhanced functions

ORC foreign tables support the ZSTD compression format.
GIS supports the st_asmvtgeom, st_asmvt, and st_squaregrid functions.

Version 9.1.0.100 (August 12, 2024)

Elastic architecture

Architecture upgrade: The storage-compute decoupling architecture 3.0, based on OBS, introduces layered and elastic computing and storage, with on-demand storage charging to reduce costs and improve efficiency. Multiple virtual warehouses (VWs) can be deployed to enhance service isolation and resolve resource contention.
The elastic VW feature, which is stateless and supports read/write acceleration, addresses issues like insufficient concurrent processing, unbalanced peak and off-peak hours, and resource contention for data loading and analytics. For details, see Elastically Adding or Deleting a Logical Cluster.
Both auto scale-out and classic scale-out are supported when adding or deleting DNs. Auto scale-out does not redistribute data on OBS, while classic scale-out redistributes all data. The system automatically selects the scale-out mode based on the total number of buckets and DNs.
The storage-compute decoupling architecture (DWS 3.0) enhances performance with disk cache and asynchronous I/O read/write. When the disk cache is fully utilized, performance matches that of the storage-compute integration architecture (DWS 2.0).

Figure 1 Decoupled storage and compute
Click to enlarge

Real-time processing

Launched the vectorized Turbo acceleration engine, doubling the performance of tpch 1000x.
Upgraded version of hstore, called hstore_opt, offers a higher compression ratio and works in conjunction with the Turbo engine to reduce storage space by 40% when compared to column storage.
With Flink, you can connect directly to DNs to import data into the database. This results in linear performance improvement in batch data import scenarios. For details, see Real-Time Binlog Consumption by Flink.
GaussDB(DWS) supports Binlog (currently in beta) and can be used in conjunction with Flink to enable incremental computing. For details, see Subscribing to Hybrid Data Warehouse Binlog.
This update significantly improves full-column performance while reducing resource consumption.
GaussDB(DWS) supports materialized views (currently in beta). For details, see CREATE MATERIALIZED VIEW.
To improve coarse filtering, the Varchar/text column now supports bitmap index and bloom filter. When creating a table, you must specify them explicitly. For details, see CREATE TABLE.
To enhance performance in topK and join scenarios, the runtime filter feature is now supported. You can learn more about GUC parameters runtime_filter_type and runtime_filter_ratiox in Other Optimizer Options.
GaussDB(DWS) supports asynchronous sorting to enhance the min-max coarse filtering effect of PCK columns.
The performance in the IN scenario is greatly improved.
ANALYZE supports incremental merging of partition statistics, collecting only statistics on changed partitions and reusing historical data, which improves execution efficiency. It collects statistics only on predicate columns.
- The CREATE TABLE syntax now includes the incremental_analyze parameter to control whether to enable incremental ANALYZE mode for partitioned tables. For details, see CREATE TABLE.
- The enable_analyze_partition GUC parameter determines whether to collect statistics on a partition of a table. For details, see Other Optimizer Options.
- The enable_expr_skew_optimization GUC parameter controls whether to use expression statistics in the skew optimization policy. For details, see Optimizer Method Configuration.
- ANALYZE | ANALYSE
GaussDB(DWS) supports large and wide tables, with a maximum of 5,000 columns.
Create index/reindex supports parallel processing.
The pgxc_get_cstore_dirty_ratio function is added to obtain the dirty page rate of CU, Delta, and CUDesc in the target table (only hstore_opt is supported).

[Convergence and unification]

One-click lakehouse: You can use create external schema to connect to the HiveMetaStore metadata, avoiding complex create foreign table operations and reducing maintenance costs. For details, see Accessing HiveMetaStore Across Clusters.
GaussDB(DWS) allows for reading and writing in Parquet/ORC format, as well as overwriting, appending, and multi-level partition read and write.
GaussDB(DWS) allows for reading in Hudi format.
Foreign tables support concurrent execution of ANALYZE, significantly improving the precision and speed of statistics collection. However, foreign tables do not support AutoAnalyze capabilities, so it is recommended to manually perform ANALYZE after data import.
Foreign tables can use the local disk cache for read acceleration.
Predicates such as IN and NOT IN can be pushed down for foreign tables to enhance partition pruning.
Foreign tables now support complex types such as map, struct, and array, as well as bytea and blob types.
Foreign tables support data masking and row-level access control.
GDS now supports the fault tolerance parameter compatible_illegal_char for exporting foreign tables.
The read_foreign_table_file function is added to parse ORC and Parquet files, facilitating fault demarcation.

High availability

The fault recovery speed of the unlogged table is greatly improved.
Backup sets support cross-version restoration. Fine-grained table-level restoration supports restoration of backup sets generated by clusters of earlier versions (8.1.3 and later versions).
Fine-grained table-level restoration supports restoration to a heterogeneous cluster (the number of nodes, DNs, and CNs can be different).
Fine-grained restoration supports permissions and comments. Cluster-level and schema-level physical fine-grained backups support backup permissions and comments, as do table-level restorations and schema-level DR.

Space saving

Column storage now supports JSONB and JSON types, allowing JSON tables to be created as column-store tables, unlike earlier versions which only supported row-store tables.
Hot and cold tables support partition-level index unusable, saving local index space for cold partitions.
The upgraded hstore_opt provides a higher compression ratio and, when used with the Turbo engine, saves 40% more space compared to column storage.

O&M and stability improvement

The query filter is enhanced to support interception by SQL feature, type, source, and processed data volume. For details, see CREATE BLOCK RULE.
GaussDB(DWS) now automatically frees up memory resources by reclaiming idle connections in a timely manner. You can specify the syscache_clean_policy parameter to set the policy for clearing the memory and number of idle DN connections. For details, see Connection Pool Parameters.
The gs_switch_respool function is added for dynamic switching of the resource pool used by queryid and threadid. This enables dynamic adjustment of the resources used by SQL. For details, see Resource Management Functions.
The pg_sequences view is added to display the attributes of sequences accessible to the current user.
The following functions are added to allow you to query information about all chunks requested by the memory in a specified shared memory:
The pgxc_query_resource_info function is added to display the resource usage of the SQL statement corresponding to a specified query ID on all DNs. For details, see pgxc_query_resource_info.
The pgxc_stat_get_last_data_access_timestamp function is added to return the last access time of a table. This helps the service to identify and clear tables that have not been accessed for a long time. For details, see pgxc_stat_get_last_data_access_timestamp.
SQL hints support more hints that provide better control over the generation of execution plans. For details, see Configuration Parameter Hints.
Performance fields are added to top SQL statements that are related to syntax parsing and disk cache. This makes it easier to identify performance issues. For details, see Real-time Top SQL.
The preset data masking administrator has the authority to create, modify, and delete data masking policies.
Audit logs can record objects that are deleted in cascading mode.
Audit logs can be dumped to OBS.

Ecosystem compatibility

if not exists can be included in the create schema, create index, and create sequence statements.
The merge into statement now allows for specified partitions to be merged. For details, see MERGE INTO.
In Teradata-compatible mode, trailing spaces in strings can be ignored when comparing them.
GUC parameters can be used to determine if the n in varchar(n) will be automatically converted to nvarchar2.
PostGIS has been upgraded to version 3.2.2.

Restrictions

A maximum of 256 VWs are supported, each with 1,024 DNs. It is recommended to have no more than 32 VWs and 128 DNs.
DR is not supported by OBS tables that have decoupled storage and compute. Only full backup and restoration are available.

Behavior changes

VACUUM FULL, ANALYZE, and CLUSTER are only supported for individual tables, not the entire database. Even though there are no syntax errors, the commands will not be executed.
OBS tables with decoupled storage and compute do not support delta tables. If enable_delta is set to on, no error is reported, but delta tables do not take effect. If a delta table is required, use the hstore-opt table instead.
By default, NUMA core binding is enabled and can be turned off dynamically using the enable_numa_bind parameter.
Upgrading from version 8.3.0 Turbo to version 910 changes the numeric(38) data type in Turbo tables to numeric(39), without affecting display width. Rolling back to the previous version will not reverse this change.
Due to the decoupling of storage and compute, the EVS storage space in DWS 3.0 is half that of DWS 2.0 by default. For example, purchasing 1 TB of EVS storage provides 500 GB in DWS 3.0 for active/standby mode, compared to 1 TB in DWS 2.0. When migrating data from DWS 2.0 to DWS 3.0, the EVS storage space required in DWS 3.0 is twice that of DWS 2.0.