Pattern Matching Operators
The database provides three independent methods for implementing pattern matching: SQL LIKE operator, SIMILAR TO operator, and POSIX-style regular expressions. Besides these basic operators, functions can be used to extract or replace matching substrings and to split a string at matching locations.
- LIKE
Description: Specifies whether the string matches the pattern string following LIKE. The LIKE expression returns true if the string matches the supplied pattern. (As expected, the NOT LIKE expression returns false if LIKE returns true, and vice versa.)
Matching rules:- This operator can succeed only when its pattern matches the entire string. If you want to match a sequence in any position within the string, the pattern must begin and end with a percent sign.
- The underscore (_) represents (matching) any single character. Percentage (%) indicates the wildcard character of any string.
- To match a literal underscore or percent sign, the respective character in pattern must be preceded by the escape character. The default escape character is one backslash but a different one can be selected by using the ESCAPE clause.
- To match with escape characters, enter two escape characters. For example, to write a pattern constant containing a backslash (\), you need to enter two backslashes in SQL statements.
When standard_conforming_strings is set to off, any backslashes you write in literal string constants will need to be doubled. Therefore, writing a pattern that matches a single backslash actually involves writing four backslashes in the statement (you can avoid this by selecting a different escape character with ESCAPE so that the backslash is no longer a special character of LIKE. But the backslash is still the special character of the character text analyzer. In this case, two backslashes are required.)
In MySQL-compatible schema, it is also possible to select no escape character by writing ESCAPE ''. This effectively disables the escape mechanism, which makes it impossible to turn off the special meaning of underscore and percent signs in the schema.
- The keyword ILIKE can be used instead of LIKE to make the match case-insensitive.
- Operator ~~ is equivalent to LIKE, and operator ~~* corresponds to ILIKE.
Example:
1 2 3 4 5
gaussdb=# SELECT 'abc' LIKE 'abc' AS RESULT; result ----------- t (1 row)
1 2 3 4 5
gaussdb=# SELECT 'abc' LIKE 'a%' AS RESULT; result ----------- t (1 row)
1 2 3 4 5
gaussdb=# SELECT 'abc' LIKE '_b_' AS RESULT; result ----------- t (1 row)
1 2 3 4 5
gaussdb=# SELECT 'abc' LIKE 'c' AS RESULT; result ----------- f (1 row)
- SIMILAR TO
Description: Returns true or false depending on whether the pattern matches the given string. It is similar to LIKE, but differs in that it uses the regular expression understanding pattern defined by the SQL standard.
Matching rules:- Similar to LIKE, this operator succeeds only when its pattern matches the entire string. If you want to match a sequence in any position within the string, the pattern must begin and end with a percent sign.
- The underscore (_) represents (matching) any single character. Percentage (%) indicates the wildcard character of any string.
- SIMILAR TO supports these pattern-matching metacharacters borrowed from POSIX-style regular expressions:
Metacharacter
Description
|
Specifies alternation (either of two alternatives).
*
Specifies repetition of the previous item zero or more times.
+
Specifies repetition of the previous item one or more times.
?
Specifies repetition of the previous item zero or one time.
{m}
Specifies repetition of the previous item exactly m times.
{m,}
Specifies repetition of the previous item m or more times.
{m,n}
Specifies repetition of the previous item at least m times and does not exceed n times.
()
Specifies that parentheses () can be used to group items into a single logical item.
[...]
Specifies a character class, just as in POSIX-style regular expressions.
- A preamble escape character disables the special meaning of any of these metacharacters. The rules for using escape characters are the same as those for LIKE.
Regular expressions:
The substring(string from pattern for escape) function extracts a substring that matches an SQL regular expression pattern.
Example:
1 2 3 4 5
gaussdb=# SELECT 'abc' SIMILAR TO 'abc' AS RESULT; result ----------- t (1 row)
1 2 3 4 5
gaussdb=# SELECT 'abc' SIMILAR TO 'a' AS RESULT; result ----------- f (1 row)
1 2 3 4 5
gaussdb=# SELECT 'abc' SIMILAR TO '%(b|d)%' AS RESULT; result ----------- t (1 row)
1 2 3 4 5
gaussdb=# SELECT 'abc' SIMILAR TO '(b|c)%' AS RESULT; result ----------- f (1 row)
- POSIX-style regular expressions
Description: A regular expression is a collation that is an abbreviated definition of a set of strings (a regular set). If a string is a member of a regular expression described by a regular expression, the string matches the regular expression. POSIX-style regular expressions provide a more powerful means for pattern matching than the LIKE and SIMILAR TO operators. Table 1 lists all available operators for pattern matching using POSIX-style regular expressions.
Table 1 Regular expression match operators Operator
Description
Example
~
Matches a regular expression, which is case-sensitive.
'thomas' ~ '.*thomas.*'
~*
Matches a regular expression, which is case-insensitive.
'thomas' ~* '.*Thomas.*'
!~
Does not match a regular expression, which is case-sensitive.
'thomas' !~ '.*Thomas.*'
!~*
Does not match a regular expression, which is case-insensitive.
'thomas' !~* '.*vadim.*'
Matching rules:- Unlike LIKE patterns, a regular expression is allowed to match anywhere within a string, unless the regular expression is explicitly anchored to the beginning or end of the string.
- Besides the metacharacters mentioned above, POSIX-style regular expressions also support the following pattern matching metacharacters:
Metacharacter
Description
^
Specifies the match starting with a string.
$
Specifies the match at the end of a string.
.
Matches any single character.
Regular expressions:
POSIX-style regular expressions support the following functions:- The substring(string from pattern) function provides a method for extracting a substring that matches the POSIX-style regular expression pattern.
- The regexp_count(string text, pattern text [, position int [, flags text]]) function provides the function of obtaining the number of substrings that match the POSIX-style regular expression pattern.
- The regexp_instr(string text, pattern text [, position int [, occurrence int [, return_opt int [, flags text]]]]) function is used to obtain the position of a substring that matches a POSIX-style regular expression pattern.
- The regexp_substr(string text, pattern text [, position int [, occurrence int [, flags text]]]) function provides a method to extract a substring that matches a POSIX-style regular expression pattern.
- The regexp_replace(string, pattern, replacement [,flags ]) function replaces the substring that matches the POSIX-style regular expression pattern with the new text.
- The regexp_matches(string text, pattern text [, flags text]) function returns a text array consisting of all captured substrings that match a POSIX-style regular expression pattern.
- The regexp_split_to_table(string text, pattern text [, flags text]) function splits a string using a POSIX-style regular expression pattern as a delimiter.
- The regexp_split_to_array(string text, pattern text [, flags text ]) function behaves the same as regexp_split_to_table, except that regexp_split_to_array returns its result as an array of text.
The regular expression split functions ignore zero-length matches, which occur at the beginning or end of a string or after the previous match. This is contrary to the strict definition of regular expression matching. The latter is implemented by regexp_matches, but the former is usually the most commonly used behavior in practice.
Example:
1 2 3 4 5
gaussdb=# SELECT 'abc' ~ 'Abc' AS RESULT; result -------- f (1 row)
1 2 3 4 5
gaussdb=# SELECT 'abc' ~* 'Abc' AS RESULT; result -------- t (1 row)
1 2 3 4 5
gaussdb=# SELECT 'abc' !~ 'Abc' AS RESULT; result -------- t (1 row)
1 2 3 4 5
gaussdb=# SELECT 'abc'!~* 'Abc' AS RESULT; result -------- f (1 row)
1 2 3 4 5
gaussdb=# SELECT 'abc' ~ '^a' AS RESULT; result -------- t (1 row)
1 2 3 4 5
gaussdb=# SELECT 'abc' ~ '(b|d)'AS RESULT; result -------- t (1 row)
1 2 3 4 5
gaussdb=# SELECT 'abc' ~ '^(b|c)'AS RESULT; result -------- f (1 row)
Although most regular expression searches can be executed quickly, they can still be artificially processed to require any length of time and any amount of memory. It is not recommended that you accept the regular expression search pattern from the non-security pattern source. If you must do this, you are advised to add the statement timeout limit. The search with the SIMILAR TO pattern has the same security risks as the SIMILAR TO provides many capabilities that are the same as those of the POSIX-style regular expression. The LIKE search is much simpler than the other two options. Therefore, it is more secure to accept the non-secure pattern source search.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot