Updated on 2024-06-03 GMT+08:00

Database-level Character Sets and Collations

When creating a database, you can specify the character set and collation of the database.

 CREATE DATABASE [IF NOT EXISTS] database_name
               [ ENCODING [=] encoding ] |
               [ LC_COLLATE [=] lc_collate ] |
               [ LC_CTYPE [=] lc_ctype ] |;

Syntax description:

  • database_name

    Specifies the database name.

    Value range: a string. It must comply with the identifier naming convention.

  • ENCODING [ = ] encoding

    Specifies the character encoding used by the database. The value can be a string (for example, 'SQL_ASCII') or an integer.

  • LC_COLLATE [ = ] 'lc_collate'

    Specifies the character set used by the new database. For example, set this parameter by using lc_collate = 'zh_CN.gbk'.

    The use of this parameter affects the sort order of strings (for example, the order of using ORDER BY for execution and the order of using indexes on text columns). By default, the sorting order of the template database is used.

    Value range: character sets supported by the OS.

  • LC_CTYPE [ = ] 'lc_ctype'

    Specifies the character class used by the new database. For example, set this parameter by using lc_ctype = 'zh_CN.gbk'. The use of this parameter affects the character class, such as uppercase letters, lowercase letters, and digits. By default, the character class of the template database is used.

    Value range: character classes supported by the OS.

    • The database-level character set and collation syntax can be used in all modes. For details about the syntax, see CREATE DATABASE.
    • The LC_COLLATE/LC_CTYPE syntax does not support the collations specific to B-compatible mode. The parameter value range depends on the character sets supported by the local environment. You can run the locale -a command to view the parameter value range.