Updated on 2025-10-23 GMT+08:00

Character Sets and Collations

A character set provides character encoding rules, and a collation provides character sorting rules. This section describes the character sets and collations in M-compatible databases. The following character sets, collation rules, and syntax are supported only in M-compatible databases.

  • The character sets and collations of M-compatible databases must meet the following requirements:
    • Each character set has one or more collations and has only one default collation.

    • Each collation has only one associated character set.

    • The sorting results of the same data using different collations may be different.

    • utf8mb4 and utf8 are the same character set.

    • You are advised to select the same character set for table columns and server_encoding to avoid performance loss caused by transcoding.

  • The character sets and collations of M-compatible databases support the following functions:
    • Multiple character sets can be used to store character strings.

    • Collations can be used to compare character strings.

    • Database-level, schema-level, table-level, and column-level character sets and collations are supported.

Character sets other than the BINARY character set and the character sets of the current database cannot be used in the same database.

Supported Character Sets

The following character sets are supported by M-compatible databases.

Table 1 Character sets supported by M-compatible databases

Character Set

Description

Default Collation

utf8

Variable-length Unicode character code. The length of the character code ranges from 1 to 4 bytes.

utf8mb4_general_ci

utf8mb4

The character set is the same as that of utf8.

utf8mb4_general_ci

gbk

Extended character set of Chinese characters in the GB standard.

gbk_chinese_ci

gb18030

Character set of Chinese characters in the GB standard.

gb18030_chinese_ci

binary

Binary pseudo-character set.

binary

  • Currently, the database-level, schema-level, table-level, and column-level syntax supports only character sets specified in Table 1.
  • The BINARY character set is implemented by using the existing character set SQL_ASCII.
  • The conversion logic between character sets in GaussDB is different from that in MySQL. Therefore, some special characters that can be converted in M-compatible databases may fail to be converted in MySQL. You are advised not to use these uncommon characters.
  • Currently, GaussDB does not perform strict encoding logic verification on invalid characters that do not belong to the current character set. As a result, such invalid characters may be successfully entered.

Supported Collations

The following collations are supported by M-compatible databases.

Table 2 Collations supported by M-compatible databases

Collation

Character Set

Description

Blank Filling

utf8_bin

utf8

The binary collation is used.

Supported.

utf8_general_ci

utf8

The general collation is used.

Supported.

utf8_unicode_ci

utf8

The Unicode Collation Algorithm (UCA)-based collation is used.

Supported.

utf8mb4_bin

utf8mb4

Same as utf8_bin.

Supported.

utf8mb4_general_ci

utfbmb4

Same as utf8_general_ci.

Supported.

utf8mb4_unicode_ci

utf8mb4

Same as utf8_unicode_ci.

Supported.

utf8mb4_0900_ai_ci

utf8mb4

The Unicode Collation Algorithm (UCA)-based collation is used.

Supported.

gbk_bin

gbk

The binary collation is used.

Supported.

gbk_chinese_ci

gbk

The Chinese (pinyin) collation is used.

Supported.

gb18030_bin

gb18030

The binary collation is used.

Supported.

gb18030_chinese_ci

gb18030

The Chinese (pinyin) collation is used.

Supported.

binary

binary

The binary collation is used.

Not supported.

  • Collation names start with the names of the character sets associated with them and are usually followed by one or more suffixes that indicate other character order features. For example, _bin indicates binary sorting rules, and _ci indicates case-insensitive.
  • If a collation supports blank filling, spaces at the end of character strings are ignored during comparison, for example, :'A' = 'A'.