Updated on 2025-10-23 GMT+08:00

Character Sets and Collations

A character set provides character encoding rules, and a collation provides character sorting rules. This section describes the character sets and collations in M-compatible. The following character sets, collation rules, and syntax are supported only in M-compatible.

  • The character sets and collations of M-compatible must meet the following requirements:
    • Each character set has one or more collations and has only one default collation.

    • Each collation has only one associated character set.

    • The sorting results of the same data using different collations may be different.

    • utf8mb4 and utf8 are the same character set.

    • You are advised to select the same character set for table columns and server_encoding to avoid performance loss caused by transcoding.

  • The character sets and collations of M-compatible support the following functions:
    • Multiple character sets can be used to store character strings.

    • Collations can be used to compare character strings.

    • Database-level, schema-level, table-level, and column-level character sets and collations are supported.

Except the database using SQL_ASCII, databases using other character sets support mixed use of multiple character sets.

Supported Character Sets

The following character sets are supported by M-compatible.

Table 1 Character sets supported by M-compatible

Character Set

Description

Default Collation

utf8

Variable-length Unicode character code. The length of the character code ranges from 1 to 4 bytes.

utf8mb4_general_ci

utf8mb4

The character set is the same as that of utf8.

utf8mb4_general_ci

gbk

Extended character set of Chinese characters in the GB standard.

gbk_chinese_ci

gb18030

Character set of Chinese characters in the GB standard.

gb18030_chinese_ci

binary

Binary pseudo-character set.

binary

latin1

Latin character set.

latin1_swedish_ci

  • Currently, the database-level, schema-level, table-level, and column-level syntax supports only the preceding character sets.
  • The binary character set is implemented by using the existing character set SQL_ASCII.
  • The conversion logic between character sets in GaussDB is different from that in MySQL. Therefore, some special characters that can be converted in M-compatible mode may fail to be converted in MySQL. You are advised not to use these uncommon characters.
  • Currently, GaussDB does not perform strict encoding logic verification on invalid characters that do not belong to the current character set. As a result, such invalid characters may be successfully entered.

Supported Collations

The following collations are supported by M-compatible databases.

Table 2 Collations supported by M-compatible databases

Collation

Character Set

Description

Blank Filling

utf8_bin

utf8

The binary collation is used.

Supported.

utf8_general_ci

utf8

The general collation is used.

Supported.

utf8_unicode_ci

utf8

The Unicode Collation Algorithm (UCA)-based collation is used.

Supported.

utf8mb4_bin

utf8mb4

Same as utf8_bin.

Supported.

utf8mb4_general_ci

utfbmb4

Same as utf8_general_ci.

Supported.

utf8mb4_unicode_ci

utf8mb4

Same as utf8_unicode_ci.

Supported.

utf8mb4_0900_ai_ci

utf8mb4

The Unicode Collation Algorithm (UCA)-based collation is used.

Supported.

gbk_bin

gbk

The binary collation is used.

Supported.

gbk_chinese_ci

gbk

The Chinese (pinyin) collation is used.

Supported.

gb18030_bin

gb18030

The binary collation is used.

Supported.

gb18030_chinese_ci

gb18030

The Chinese (pinyin) collation is used.

Supported.

binary

binary

The binary collation is used.

Not supported.

latin1_swedish_ci

latin1

The Swedish collation is used.

Supported.

latin1_bin

latin1

The binary collation is used.

Supported.

  • Collation names start with the names of the character sets associated with them and are usually followed by one or more suffixes that indicate other character order features. For example, _bin indicates binary sorting rules, and _ci indicates case-insensitive.
  • If a collation supports blank filling, spaces at the end of character strings are ignored during comparison, for example, :'A' = 'A'.