Updated on 2024-09-12 GMT+08:00

8.x Versions

This section describes the 8.x kernel version updates of GaussDB.
Table 1 What's new in version 8.102.0

Date

Feature

Description

2024-04

For O compatibility, aggregation-related syntax, table update based on views and subqueries, and comparison operators with spaces are supported.

  1. Aggregate functions and ORDER BY can be used at the same time.
  2. Nested aggregate functions are supported.
  3. You can insert, update, and delete tables in subqueries and views.
  4. Spaces are allowed in comparison operators such as "> =" and "< =".

For O compatibility, system functions and system views are supported.

Based on the existing O compatibility, some system functions and system views are supported, including:

  1. Two types of window function syntax are compatible.
  2. The sys_guid function is compatible.
  3. The compatibility of the time function is improved. The current time instead of the transaction start time can be obtained.
  4. The data sorting function array_sort is supported.
  5. The SQL syntax FROM DUAL is compatible.
  6. The all_directories view is supplemented.
  7. BULK COLLECT can be used together with EXECUTE.
  8. Syntax related to distributed agg is supported.
  9. The row construction syntax => is supported.
  10. The EXTRACTVALUE system function is supported.

For O compatibility, encoding exceptions and hybrid encoding of special characters are supported.

  1. The GBK extended character set zhs16gbk is added. In addition, client_encoding can be set to zhs16gbk for JDBC and the database.
  2. A transcoding function is provided for the new character set zhs16gbk. The conversion relationships are as follows: zhs16gbk to UTF8, UTF8 to zhs16gbk, and GB18030 to zhs16gbk (euro symbol). zhs16gbk (euro symbol) to GB18030.
  3. The JDBC driver provides the transcoding function to solve the transcoding problem of the extended character set zhs16gbk.
  4. To ensure that customer data containing special characters can be imported and exported to the database, the import and export tool supports the following functions in the case of the same encoding:
    • Use \copy to to export a binary file. Transcoding and verification are not performed during the export. The header of the binary file contains the original encoding information.
    • Use gsloader or \copy from to import the binary file. Transcoding and verification are not performed during the import.
  5. Output is added to ensure that no error is reported during forcible encoding. The output is a placeholder.

For O compatibility, stored procedures support synonyms, subtypes, dynamic anonymous blocks, and triggers, enhancing commercial capabilities.

In terms of O compatibility, the following contents are added:

  1. Triggers support the CREATE OR REPLACE syntax and anonymous block syntax.
  2. Anonymous blocks of dynamic statements are enhanced.
  3. The stored procedure supports the syntax related to subtype.
  4. Synonyms are enhanced.

O compatibility supports cross-type integer comparison, bpchar fuzzy match, and system function matching policy optimization.

In O-compatible mode:

  1. A cross-type index operator class is added for the integer type (int1/int2/int4/int8/int16/numeric).
  2. An index operator class for fuzzy match of bpchar and bpchar is added.
  3. The internal matching policy of system functions is optimized.

For O compatibility, the XMLGEN, STATS, and DESCRIBE advanced packages are supported.

In O-compatible mode, some APIs are supported in the DBMS_XMLGEN, DBMS_STATS, and DBMS_DESCRIBE advanced packages.

For M compatibility, commercial requirements such as data types and syntax functions are supported.

  1. Data type compatibility: The year data type is supported and the width of integer(p) can be displayed.
  2. DDL syntax compatibility: The syntax for deleting primary key and foreign key constraints of a table is supported. The syntax for changing table names is compatible. The syntax for adding partitions is compatible. Auto-increment columns support distributed deployment. The create table like syntax is compatible. The create table syntax is compatible with the ENGINE=InnoDB syntax of MySQL databases.
  3. DML syntax compatibility: The ignore keyword can be used to ignore errors in specific scenarios and the load data function can be used to import data.
  4. Operator and expression compatibility: The set names syntax is added to control the character set and collation used for LIKE matching. The source REGEXP pattern and regular expression matching operators are supported to determine whether the source string matches the pattern string. The matching rules are the same as those of the existing ~* operator in GaussDB and may be different in some scenarios.
  5. Backquote is supported. The database object name referenced by backquote is case sensitive.
  6. The utf8_bin, utf8_general_ci and utf8mb4_0900_ai_ci collations are supported.
  7. The following system functions are added for character string processing: strcmp(), log10(), log2(), bin(), elt(), field(), insert(), locate(), make_set(), quote(), space(), and default().

For M compatibility, new frameworks and protocols are compatible for commercial use.

The new M-compatible framework supports full compatibility with MySQL database syntax in the future, avoiding syntax isolation and forward compatibility such as syntax and keyword occupation in the old framework. The function operator behavior is the same as that of MySQL databases, and the MySQL protocol is supported.

For M compatibility, the existing syntax adapts to the new framework and supports commercial use.

The new M-compatible framework uses the hook mechanism to implement the compatibility function in independent extension and isolate the compatibility function from the GaussDB main process to avoid forward compatibility issues caused by intrusive modification. This feature synchronizes the existing 107 SQL commands to the new framework.

The JDBC driver supports the streaming read capability.

The GaussDB JDBC driver supports the streaming read capability. In streaming read mode, JDBC does not generate OOM.

The JDBC driver supports JDK 1.7 and the enhanced JDBC O&M capability.

  1. JDBC memory release is optimized to prevent client memory overflow in abnormal connection scenarios.
  2. The JDBC driver is used to proactively detect related indicator data on the client and database sides, providing indicator data for database problem demarcation.
  3. JDBC adapts to JDK 1.7.
  4. The executeUpdate command can be used to execute DQL statements and obtain query results.

The commercial performance of the default configuration parameters is not lower than 1 million tpmC.

The performance of default GaussDB configuration parameters is optimized. The performance value of default GaussDB configuration parameters measured by the standard benchmark (TPC-C) is improved to no less than 1 million tpmC. The capability of locating performance issues is improved.

Based on ADIO, the performance is improved by 20% in typical large-capacity scenarios.

In large-capacity scenarios, the AIO-DIO technology and doublewrite removal function are used to fully utilize I/O resources to improve database performance by more than 20%. In addition, online switching from the BIO mode to the ADIO mode is supported.

The performance of large concurrent write transactions in centralized mode is improved by 50%.

  1. The performance of basic write transactions in the case of a large number of concurrent requests is optimized, including insertion, update with indexes, and update without indexes.
  2. The performance in scenarios where read operations are more than write operations and hotspot pages exist in the case of a large number of concurrent requests is optimized.

The performance is optimized by 15% based on stored procedures in typical batch processing scenarios.

The noise floor of the stored procedure is optimized to support SQLBYPASS.

Concurrent cursor query is supported, improving performance by more than 30% in typical scenarios.

Cursors can be concurrently queried to improve cursor usage efficiency and insert select parallel performance in Ustore.

Based on the window function, the performance is improved by six times in typical page turning scenarios.

The projection column of a subquery contains a window function, and the parent query contains filter criteria for the window function. This feature allows the outer filter criteria to be pushed down to the inner subquery.

For Codegen commercial use, expressions are heavily used in TPC-H calculation, improving typical query performance by 20%.

The commercial use capability of the Codegen is improved. The Codegen is enabled by default to solve the calculation performance problem of complex query expressions.

Parallel scanning of predicate indexes is supported. In typical scenarios, the performance is 10% higher than that of PG16.

Parallel index scan with predicates (IndexScan and IndexOnlyscan) is supported to improve performance in typical scenarios.

Local partitioned indexes can be created offline and concurrently between Astore partitions.

Inter-partition parallelism is supported. During local partitioned index creation, steps such as scanning, sorting, and B-tree insertion are performed in parallel. The overall performance (when partition data is evenly distributed) is better than that of the parallel creation solution in the current partition.

SPM supports restoration of complex SQL statements.

Based on the plan management function supported by SPM, the following enhancements are made:

  1. The outline supports the physical operator control of the SMP plan.
  2. The outline supports the subplan that controls the pullup of any sublink.
  3. Semi join can be controlled.
  4. If more than 12 tables are joined to the SPM outline, the hint takes effect and the plan can be quickly restored.

DR switchover stability achieves 99% in typical scenarios, ensuring service recovery within 5 minutes.

The internal implementation mechanism and performance are optimized based on typical DR scenarios, effectively improving the DR switchover performance and stability.

Arterial detection model for first aid is first put into commercial use, which supports slow disk detection.

The arterial detection model is built to identify arterial subhealth problems and provide corresponding measures to improve database HA.

Client service statements can be terminated with socketTimeout.

When the client is disconnected due to timeout, the GaussDB server can detect the disconnection in a timely manner and terminate the running service statements corresponding to the connection. This prevents session resource stacking and service loss caused by retry due to socket timeout on the service side.

Automatic repair based on physical bad blocks: Pages on the standby node can be repaired from the primary node in seconds.

  1. In the primary and standby deployment mode, if the target page (data page, index page, undo slot page, or undo page) of the standby node is physically damaged, that is, CRC mismatch occurs, the standby node can pull the target page from the primary node when reading the page during replay. If the page of the primary node is correct, the standby node can be overwritten to perform automatic recovery.
  2. This ensures that replay on the standby node is not interrupted and services are unaware of the replay. Backup and restoration or full build is not required.
  3. When the page CRC detects that the physical page is damaged, the primary node can be automatically triggered to repair the standby node. This function is enabled by default.
  4. DFX is optimized, for example, log printing and page parsing after a page is damaged, and system functions for querying bad blocks.

PITR modular decoupling and key scenario locating and demarcation are improved.

  1. Barrier files can be repaired during startup.
  2. Archive log clearing is moved from the kernel side to the gs_roach side, eliminating the impact of the log clearing logic on the gaussdb kernel.
  3. Several system functions are added to query and operate archive-related files.

Automatic list/range partitioning is supported for commercial use.

  1. Automatic partitioning is supported. If the inserted data cannot match any existing partition, a new partition is automatically created. Automatic transaction commit is supported in centralized mode.
  2. This behavior can be triggered when a sparsely partitioned index is created. The automatic partitioning capability can be enabled or disabled online.

Row-store compression supports page-level LMT.

After advanced compression is enabled and an ILM policy is specified for a table, background scheduling is periodically started to scan all rows. After data is frozen, the current timestamp is used as the last modification time of frozen tuples to determine hot and cold data. There is a difference between the timestamp information and the actual last modification time of the tuples. To accurately indicate the LMT, the timestamp corresponding to the LSN of the page where the tuples are located is used to indicate the LMT of the tuples. The timestamp is used as the time basis for determining whether the tuples are cold or hot.

Based on stored procedures, the global compilation memory usage is reduced by 30% in typical scenarios with a large number of concurrent requests.

In the scenario of a large number of concurrent requests, stored procedures occupy a large amount of memory. Therefore, some improper memory usage needs to be optimized, mainly including structure arrays related to the number of parameters or memory sharing. The memory usage mainly refers to the type description of variables in stored procedures to reduce the memory usage, the concurrent database scale-out capability is improved.

In typical scenarios with 4 vCPUs and 16 GB of memory, the CPU noise floor of the CM component decreases by 2.75% and the memory usage decreases by 46%.

The CPU and memory of the CM component are optimized for small-scale deployment in typical scenarios with 4 vCPUs and 16 GB of memory.

Ustore supports efficient storage of flexible fields.

Enhanced TOAST is a technology used to process oversized fields. It reduces redundant information in TOAST pointers to allow more oversized columns in a single table, and optimizes the mapping between the main table and off-line storage tables. You do not need to use pg_toast_index to store the relationship between main table data and off-line storage table data, reducing storage space. Enhanced TOAST enables split data to be self-linked, eliminating the dependency of the original chunk ID allocation process and greatly improving the write performance.

Ustore supports large-scale commercial use of TOAST.

  1. TOAST table indexes can be upgraded to UB-tree indexes by performing the reindex, reindex concurrently, vacuum full, merge and split operations.
  2. gs_parse_page_bypath and pagehack can be used to parse special fields in the TOAST table and indexes.
  3. Xlogs can be recorded by SELECT FOR SHARE or SELECT FOR UPDATE.

TDE supports index encryption and RLS supports expression indexes.

  1. TDE supports encrypted indexes and can convert non-encrypted tables to TDE tables.
  2. Index scans can take effect when row-level security (RLS) is enabled for tables and predicates contain system functions or Like operators.

Sensitive data discovery is put into commercial use for the first time, enhancing privacy protection, and providing high security capabilities.

The sensitive data discovery function is implemented by calling functions. By calling different functions, you can specify the scan object and sensitive data classifier to obtain sensitive data of different levels corresponding to the scan object.

Tamper-proof Ustore is put into commercial use for the first time.

Ustore can use the tamper-proof ledger database function.

ABO supports feedback and multi-table cardinality. In typical slow query scenarios, the performance is improved by five times. Cost adaptation is supported. In scenarios where operator selection is inaccurate, the performance is improved by one time.

The adaptive cost estimation function provides the cost estimation capability based on the usual mixed model (UMM) and cost parameter model. The load monitoring monitors model accuracy, implements fast and efficient load management and incremental model update, and ensures the estimation accuracy. The real-time and efficient predicate query feature helps identify the optimal cardinality estimation policy. This feature helps solve the problem that the cost estimation is distorted and the plan is not optimal when the data and execution environment change on the live network.

An exact row number is displayed when a compilation error is reported.

The logic for calculating line numbers is adjusted to solve the problem that the line number of the function header is separated from that of the function body and line numbers are incorrectly calculated. In this way, the error line number can be obtained accurately.

Hot patches can be installed for advanced packages.

This feature provides the capability of installing hot patches for advanced packages.

Built-in flame graphs support quick performance analysis and locating.

  1. Flame graph files can be generated after stacks are manually or automatically collected.
  2. Stacks can be automatically and periodically collected and flame graph files can be generated.

The duration for locating underlying storage exceptions is shortened from weeks to days to solve the problem of missing dirty pages.

The verification and DFX capabilities are added to check for missing dirty pages, improving the fault locating and demarcation efficiency when the underlying storage returns an incorrect version. The fault locating duration is shortened from weeks to days.

  1. This feature can be used to solve the problem of missing dirty pages inside the database. No Xlog of an incorrect version is generated.
  2. If the underlying storage system does not return the correct page version, you can use the new view of this feature to demarcate the fault.

In typical service scenarios, read success on the standby node is 100% and the time for locating read problems on the standby node is shorten from weeks to days.

  1. Ultimate RTO of read on standby: The mechanism in earlier versions is optimized so that query cancellation does not need to be triggered in some scenarios, ensuring 100% read success in the following scenarios:
    • After a large number of records are deleted or updated, VACUUM is performed. As a result, useless data pages at the end of the file are truncated.
    • Cancellation is caused by relmap update, such as reindex database.
  2. The replication slots on the read-only node or cascaded standby node are automatically cleared. After the read-only node is switched over, the primary node automatically clears the replication slots. After the standby node is switched to the cascaded standby node, the replication slots of the original standby node on the primary node remain. As a result, Xlog recycling on the primary node is blocked and is triggered only when the size of Xlogs reaches the value specified by max_size_for_xlog_prune (256 GB).
  3. Log information is supplemented to help locate read on standby data consistency faults, including snapshot information and key tuple information. GUC parameters are added to control the output information. Gray-box case monitoring is added for read on standby in DML, DDL, and fault scenarios.

The troubleshooting and demarcation duration of typical communication module problems is shortened from weeks/days to hours/minutes.

  1. This feature improves the capability of locating and demarcating typical communication problems of the GaussDB communication module on the live network and in the test and development environments. For example, the GaussDB server connection is hung or slow, no valid information about pooler functions or views is provided, and the time required for sending and receiving libcomm communication messages cannot be calculated. The following three points are involved:
    • The communication link DFX capability of the GaussDB server to respond to connections is enhanced.
    • The local and peer information is added to the communication views of the pooler communications library between CNs and between CNs and DNs.
    • The libcomm communications library supports the collection of socket statistics of persistent connections between primary DNs in the system view, and supports the display of libcomm communication time when explain performance is used to query performance information related to distributed stream plans.
  2. The preceding methods are used to enhance the capability of locating and demarcating typical problems of the GaussDB Kernel communication module.
    • Quick fault locating and demarcation of the communication module: from post-event reproduction to in-event recording or troubleshooting. The demarcation duration is shortened from weeks/days to hours/minutes.
    • Fault locating methods are enriched for the communication module. Three fault locating methods are added: key time points of the GaussDB server responding to communication links; libcomm persistent connection statistics, and explain performance. The libcomm communication duration in the distributed stream plan can be printed.

Memory-overcommitted session printing is supported.

A threshold is provided. When the memory usage of a single session or SQL statement exceeds the threshold, detailed memory information (DFX information indicating that the memory usage of a single SQL statement exceeds the threshold) is printed.

The DFX performance view supports refined db_time/wait event duration statistics.

The wait event types of GaussDB modules are optimized to provide more comprehensive fault locating methods.

  1. The db_time time model is refined, including the network time. The db_time parameter is modified to improve the capability of locating and demarcating network time more accurately.
  2. Some wait events have nesting relationships. In earlier versions, nesting is not supported. As a result, outer events cannot be reported as wait events, the outer wait event time is lost, and the scenario where the outer wait event takes a long time cannot be located. After nesting is supported, outer events are reported, and the time consumption information is supplemented.
  3. Wait events for heavy code blocks and large logic blocks are supplemented to improve the locating and demarcation capabilities of full SQL and ASP in wait events.

The storage space usage of WDR snapshots is reduced by 40% in typical scenarios.

The WDR snapshot storage space occupation and snapshot space control methods are optimized.

Astore supports commercial use of hash bucket-based online scale-out.

The online scale-out technology based on the hash bucket table (Astore) is supported. The segment-page database-level data sharding and dynamic log multi-stream technologies are used to implement the online cluster scale-out solution for physical file migration. Large-scale commercial use is supported.

In terms of segment-page, Astore supports commercial use of hash bucket-based online scale-out.

  1. Segment-page capability improvement:
    • Segment-page ultimate RTO supports read on standby (Astore).
    • Segment-page supports metadata page repair and file repair.
    • Segment-page execution statistics are included in the WDR.
    • The pg_xlogdump and pagehack functions are optimized in segment-page mode.
    • Astore flashback (DROP/TRUNCATE) is supported in segment-page mode.
    • The enable_segment parameter is supported.
    • The distributed system supports segment-page system views and functions (including manual residual data clearing functions that support hash bucket).
    • In segment-page mode, local temporary tables are stored independently.
    • The segment-page ultimate RTO replay performance is optimized.
  2. The undo segment-page DFX capability is enhanced.
    • Segment-page supports undo page repair and file repair.
    • The undo temporary/unlogged table function is enhanced.
    • Undo tslot\rdata disconnection protection is supported.
    • The run logs are enhanced for the undo segment-page key process.
    • Segment-page system views and functions support undo segment-page files.

Vector databases support efficient retrieval of hundreds of millions of vectors.

The vector database integrates the optimal disk-based vector retrieval algorithm into GaussDB, enabling database users to use native SQL statements to create and import vector data, build indexes, generate query plans, and efficiently retrieve vectors.

You can use vector types (FloatVector and BoolVector) as native types. You can create GsIVFFLAT or GsDiskANN indexes to accelerate TopK ANN queries. The GsIVFFLAT index divides the high-dimensional vector space into different buckets based on distances by using a clustering algorithm. Then, in a vector retrieval process, a candidate vector retrieval set may be selected based on a distance between a query vector and a vector bucket center point. Therefore, the retrieval cost is significantly reduced compared with that of a full scan. The GsDiskANN index searches for the nearest neighbor vector for all vector points, constructs a sparse graph structure, and uses the relaxation coefficient to generate some "short-circuit" edges to accelerate query. GaussDB optimizes the data deletion logic to implement real-time deletion and ensure performance with no accuracy loss during long-term running.

GaussDB vector indexing supports high-concurrency retrieval and modification and two underlying storage engines: Astore and Ustore. In addition, it also supports MVCC. Log types are introduced to the GaussDB vector database to support complete HA capabilities, such as primary and standby synchronization in a centralized system, parallel replay, and ultimate RTO.

openGauss & PG keyword rectification

Medium- and low-risk items related to openGauss and PG keywords have been rectified, and related keyword descriptions have been deleted.

Resolved issues

The following issues are resolved:

  • The CN continuously experiences lightproxy core dumps.
  • During the switchover process, the base port cannot be connected when the CM performs zombie detection.
  • In the connection setup phase, socket timeout in connection does not take effect.
  • The service logical decoding task is slow.
  • The dbe_schedule advanced package ignores February 29 when starting the scheduled task in interval mode. The solution is time-consuming.
  • After the specifications are scaled up, the Cgroup is not updated. As a result, the number of CPUs used by the kernel is limited in CPU resource management, and the performance is poor.
  • Hints of slow SQL control rules do not take effect for virtual tables.
  • In a DR cluster, when the database is normal, a DN exception alarm is reported.