Character Sets and Collations
A character set provides character encoding rules, and a collation provides character sorting rules. This section describes the character sets and collations in M-compatible databases. The following character sets, collation rules, and syntax are supported only in M-compatible databases.
- The character sets and collations of M-compatible databases must meet the following requirements:
- The character sets and collations of M-compatible databases support the following functions:
Character sets other than the BINARY character set and the character sets of the current database cannot be used in the same database.
Supported Character Sets
The following character sets are supported by M-compatible databases.
|
Character Set |
Description |
Default Collation |
|---|---|---|
|
utf8 |
Variable-length Unicode character code. The length of the character code ranges from 1 to 4 bytes. |
utf8mb4_general_ci |
|
utf8mb4 |
The character set is the same as that of utf8. |
utf8mb4_general_ci |
|
gbk |
Extended character set of Chinese characters in the GB standard. |
gbk_chinese_ci |
|
gb18030 |
Character set of Chinese characters in the GB standard. |
gb18030_chinese_ci |
|
binary |
Binary pseudo-character set. |
binary |
- Currently, the database-level, schema-level, table-level, and column-level syntax supports only character sets specified in Table 1.
- The BINARY character set is implemented by using the existing character set SQL_ASCII.
- The conversion logic between character sets in GaussDB is different from that in MySQL. Therefore, some special characters that can be converted in M-compatible databases may fail to be converted in MySQL. You are advised not to use these uncommon characters.
- Currently, GaussDB does not perform strict encoding logic verification on invalid characters that do not belong to the current character set. As a result, such invalid characters may be successfully entered.
Supported Collations
The following collations are supported by M-compatible databases.
|
Collation |
Character Set |
Description |
Blank Filling |
|---|---|---|---|
|
utf8_bin |
utf8 |
The binary collation is used. |
Supported. |
|
utf8_general_ci |
utf8 |
The general collation is used. |
Supported. |
|
utf8_unicode_ci |
utf8 |
The Unicode Collation Algorithm (UCA)-based collation is used. |
Supported. |
|
utf8mb4_bin |
utf8mb4 |
Same as utf8_bin. |
Supported. |
|
utf8mb4_general_ci |
utfbmb4 |
Same as utf8_general_ci. |
Supported. |
|
utf8mb4_unicode_ci |
utf8mb4 |
Same as utf8_unicode_ci. |
Supported. |
|
utf8mb4_0900_ai_ci |
utf8mb4 |
The Unicode Collation Algorithm (UCA)-based collation is used. |
Supported. |
|
gbk_bin |
gbk |
The binary collation is used. |
Supported. |
|
gbk_chinese_ci |
gbk |
The Chinese (pinyin) collation is used. |
Supported. |
|
gb18030_bin |
gb18030 |
The binary collation is used. |
Supported. |
|
gb18030_chinese_ci |
gb18030 |
The Chinese (pinyin) collation is used. |
Supported. |
|
binary |
binary |
The binary collation is used. |
Not supported. |
- Collation names start with the names of the character sets associated with them and are usually followed by one or more suffixes that indicate other character order features. For example, _bin indicates binary sorting rules, and _ci indicates case-insensitive.
- If a collation supports blank filling, spaces at the end of character strings are ignored during comparison, for example, :'A' = 'A'.
- Character Sets and Collations of the Client Connection
- Database-level Character Sets and Collations
- Table-level Character Sets and Collations
- Column-level Character Sets and Collations
- Character Sets and Collations of Expressions of the String Type
- Rules for Combining Character Sets and Collations
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot