Updated on 2025-10-23 GMT+08:00

Pattern Matching Operators

The database provides the [NOT] LIKE operator matching and regular expression pattern matching.

[NOT] LIKE

Description: Specifies whether a string matches the pattern string following LIKE. The LIKE expression returns true if the string matches the provided pattern. (As expected, the NOT LIKE expression returns false if the LIKE expression returns true, and vice versa.) Pattern matching can be performed on operands (string, binary, and bit string types) that contain 0 characters.

Matching rules:
  1. This operator can succeed only when its pattern matches the entire string. If you want to match a sequence in any position within the string, the pattern must begin and end with a percent sign (%).
  2. The underscore (_) represents (matches) any single character. The percent sign (%) indicates the wildcard character of any string.
  3. To match a literal underscore or percent sign, the respective character in pattern must be preceded by the escape character. The default escape character is one backslash but a different one can be selected by using the ESCAPE clause.
  4. To match with escape characters, enter two escape characters. For example, to write a pattern constant containing a backslash (\), you need to enter two backslashes in SQL statements.
  5. Whether a value is case-sensitive depends on the collation. For example, the default value utf8mb4_general_ci is case-insensitive.
  6. The operator ~~ is equivalent to LIKE and ! ~~ is equivalent to NOT LIKE.

Examples:

  • During string matching, the collation is utf8mb4_general_ci, which is case-insensitive.
    m_db=# INSERT INTO t VALUES ('ab^CD&eF','ab^CD&eF','ab^CD&eF','ab^CD&eF','ab^CD&eF','ab^CD&eF');
    m_db=# SELECT c FROM t WHERE c LIKE '%B^c%';
        c     
    ----------
     ab^CD&eF
    (1 row)
  • For fields of the BINARY or BLOB type, the collation is utf8mb4_bin, which is case-sensitive.
    m_db=# INSERT INTO t VALUES (b'0110001', 'ab^CD&eF', 'ab^CD&eF');
    m_db=# SELECT b FROM t WHERE b LIKE 'ab%';
         b
    ------------
     ab^CD&eF
    (1 row)
  • Use the escape character to match _ and %.
    m_db=# INSERT INTO t VALUES ('ab_CD%eF','ab_CD%eF','ab_CD%eF','ab_CD%eF','ab_CD%eF','ab_CD%eF');
    m_db=# SELECT c FROM t WHERE c LIKE '%B\_c%';
        c
    ----------
     ab_CD%eF
    (1 row)
  • Specify the escape character.
    m_db=# INSERT INTO t VALUES ('ab_CD%eF','ab_CD%eF','ab_CD%eF','ab_CD%eF','ab_CD%eF','ab_CD%eF');
    m_db=# SELECT c FROM t WHERE c LIKE '%B^_c%' ESCAPE '^';
        c
    ----------
     ab_CD%eF
    (1 row)

[NOT] REGEXP

Description: The REGEXP operator is used for regular expression matching. True or false is returned based on whether the pattern matches the given string. The following table describes the regular expression operators.

Operator Name

Description

Syntax

REGEXP

Specifies whether a string matches the regular expression.

expr REGEXP pat

NOT REGEXP

Specifies whether a string does not match the regular expression. The value is negative of REGEXP.

expr NOT REGEXP pat

Regular expression syntax:

  • ^: matches the start of a string.
  • $: matches the end of a string.
  • .: matches any character (including the carriage return and newline characters).
  • a*: matches zero or more "a" characters.
  • a+: matches one or more "a" characters.
  • a?: matches zero or one "a" character.
  • de | abc: matches the "de" or "abc" character.
  • (abc)*: matches zero or more "abc" characters.
  • a{1} or a{2,3}: a{1} indicates that character "a" is repeated once, and a{2,3} indicates that character "a" is repeated twice or three times.
  • [a-dX] or [^a-dX]: indicates range matching. [a-dX] matches a, b, c, d, or X. [^a-dX] matches characters that are not a, b, c, d, or X.
  • [.characters.]: matches the collation used to verify elements. The following table lists the allowed character names and matched characters.

    Name

    Character

    Name

    Character

    NUL

    0

    SOH

    1

    STX

    2

    ETX

    3

    EOT

    4

    ENQ

    5

    ACK

    6

    BEL

    7

    alert

    7

    BS

    10

    backspace

    \b'

    HT

    11

    tab

    \t'

    LF

    12

    newline

    \n'

    VT

    13

    vertical-tab

    \v'

    FF

    14

    form-feed

    \f'

    CR

    15

    carriage-return

    \r'

    SO

    16

    SI

    17

    DLE

    20

    DC1

    21

    DC2

    22

    DC3

    23

    DC4

    24

    NAK

    25

    SYN

    26

    ETB

    27

    CAN

    30

    EM

    31

    SUB

    32

    ESC

    33

    IS4

    34

    FS

    34

    IS3

    35

    GS

    35

    IS2

    36

    RS

    36

    IS1

    37

    US

    37

    space

    '

    exclamation-mark

    !'

    quotation-mark

    "'

    number-sign

    #'

    dollar-sign

    $'

    percent-sign

    %'

    ampersand

    &'

    apostrophe

    \''

    left-parenthesis

    ('

    right-parenthesis

    )'

    asterisk

    *'

    plus-sign

    +'

    comma

    ,'

    hyphen

    -'

    hyphen-minus

    -'

    period

    .'

    full-stop

    .'

    slash

    /'

    solidus

    /'

    zero

    0'

    one

    1'

    two

    2'

    three

    3'

    four

    4'

    five

    5'

    six

    6'

    seven

    7'

    eight

    8'

    nine

    9'

    colon

    :'

    semicolon

    ;'

    less-than-sign

    <'

    equals-sign

    ='

    greater-than-sign

    >'

    question-mark

    ?'

    commercial-at

    @'

    left-square-bracket

    ['

    backslash

    \\'

    reverse-solidus

    \\'

    right-square-bracket

    ]'

    circumflex

    ^'

    circumflex-accent

    ^'

    underscore

    _'

    low-line

    _'

    grave-accent

    `'

    left-brace

    {'

    left-curly-bracket

    {'

    vertical-line

    |'

    right-brace

    }'

    right-curly-bracket

    }'

    tilde

    ~'

    DEL

    177

    -

    -

  • [=character_class=]: specifies the equivalence class. A character matches all characters with the same sort proofreading value, including itself. For example, if o and (+) are of the same type, [[=o=]], [[=(+)=]], and [o(+)] are synonyms.
  • [:character_class:]: matches all character classes that belong to the class. The following table lists the standard class names.

    Character Class

    Description

    Character Range

    alnum

    Alphanumeric numeric character.

    0–9, a–z, and A–Z

    alpha

    Alphanumeric character.

    a–z and A–Z

    blank

    Blank character.

    Blank character[\t]

    cntrl

    Control character.

    [\x01-\x1F]

    digit

    Digit character.

    [0-9]

    graph

    Graphic character.

    [^\x01-\x20]

    lower

    Lowercase character.

    [a-z]

    print

    Graphic character.

    [^\x01-\x20]

    punct

    Punctuation character.

    [-!"#$%&'( )*+,./:;<=>?@[\\]^_`{|}~]

    space

    Spaces, tabs, new lines, and carriage returns.

    [\n\r\t\x0B]

    upper

    Uppercase character.

    [A-Z]

    xdigit

    Hexadecimal numeric character.

    [0-9a-fA-F]

  • [[:<:]] and [[:>:]]: matches the start and end of a string respectively.

Examples:

m_db=# SELECT 'fo\nfo' REGEXP '^fo$'; 
 ?column? 
----------
 f
(1 row)

m_db=# SELECT 'fofo' REGEXP '^fo';
 ?column? 
----------
 t
(1 row)

m_db=# SELECT 'Ban' REGEXP '^Ba*n';  
 ?column? 
----------
 t
(1 row)

m_db=# SELECT 'Bn' REGEXP '^Ba?n';
 ?column? 
----------
 t
(1 row)

m_db=# SELECT 'pi' REGEXP 'pi|apa';
 ?column? 
----------
 t
(1 row)

m_db=# SELECT 'aXbc' NOT REGEXP '^[a-dXYZ]$';
 ?column? 
----------
 t
(1 row)

When enable_escape_string is enabled for m_format_behavior_compat_options, the following escape characters are supported:

Escape Character

Description

\'

A single quotation mark.

\"

A double quotation mark.

\b

A backspace.

\n

A newline.

\r

A carriage return.

\t

A tab.

\Z

ASCII 26 (Control Z).

\\

A backslash.

\%

A % character.

\_

An underscore.

\0

GaussDB reports the error: invalid byte sequence for encoding "UTF8": 0x00.

When enable_escape_string is disabled in m_format_behavior_compat_options and standard_conforming_strings is set to 'off', escape_string_warning is set to 'off', backslash_quote is set to 'on', the following escape characters are supported:

Escape Character

Description

\'

A single quotation mark.

\b

Backspace.

\f

Form feed, for example, C language.

\n

Newline, for example, C language.

\r

Carriage return, for example, C language.

\t

Horizontal tab, for example, C language.

\uwxyz

Character whose hexadecimal value is 0xwxyz, where wxyz is four hexadecimal digits.

\xhhh

Character whose hexadecimal value is 0xhhh, where hhh is any sequence of hexadecimal digits.

\0

GaussDB reports the error: invalid byte sequence for encoding "UTF8": 0x00.

\xy

Character whose octal value is 0xy, where xy is two octal digits.

\xyz

Character whose octal value is 0xyz, where xyz is three octal digits.

The lexical syntax is parsed based on byte streams. If a multi-byte character contains code that is consistent with symbols such as '\', '\'', and '\\', the behavior of the multi-byte character is inconsistent with that in MySQL. In this case, you are advised to disable the escape character function temporarily.