Help Center/ CodeArts PerfTest/ User Guide/ Reference/ Regular Expression Metacharacters
Updated on 2023-07-13 GMT+08:00

Regular Expression Metacharacters

Table 1 Metacharacter description

Metacharacter

Description

.

Matches any characters except \n. If \n needs to be included, use other modes such as [\s\S].

^

Matches the start position of an input character string and does not match any characters. Use \^ to match the character itself.

$

Matches the end position of an input character string and does not match any characters. Use \$ to match the character itself.

*

The preceding characters or sub-expressions are matched zero or more times. * is equivalent to {0,}. For example, \^*b can match b, ^b, ^^b, and so on.

+

Matches preceding characters or sub-expressions one or more times, equivalent to {1}. For example, a+b can match ab, aab, abb, aaab, and so on.

?

Matches preceding characters or sub-expression zero times or once, equivalent to {0,1}. For example, a[cd]? can match a, ac, and ad. When this character follows any other qualifier such as *, +, ?, {n}, {n,}, or {n,m}, the matching mode is non-greedy. Non-greedy mode matches the shortest possible searched character strings, and the default greedy mode matches the longest possible searched character strings. For example, the character string oooo, o+? matches only a single o, while o+ matches all o.

|

The logic or is performed on two matching conditions. For example, the regular expression (him|her) matches it belongs to him and it belongs to her, but cannot match it belongs to them.

\

Marks the next character as a special character, text, reverse reference, or octal escape character. For example, n matches the character n, \n matches the newline character, \\ matches \, and \( matches (.

\w

Matches a letter, digit, or underscore (_).

\W

Matches any character that is not a letter, digit, or underscore (_).

\s

Matches any blank character, such as a space, tab character, or form feed. It is equivalent to [ \f\n\r\t\v].

\S

Matches any character except blank characters and is equivalent to [^\f\n\r\t\v].

\d

Matches any digit and is equivalent to [0–9].

\D

Matches any non-digital character and is equivalent to [^0-9].

\b

Matches a word boundary (the position between a word and a space) and does not match any characters. For example, er\b matches er in never but does not match er in verb.

\B

A non-word boundary match. For example, er\B matches er in verb but does not match er in never.

\f

Matches a form feed and is equivalent to \x0c and \cL.

\n

Matches a linefeed and is equivalent to \x0a and \cJ.

\r

Matches a carriage return character and is equivalent to \x0d and \cM.

\t

Matches a tab character and is equivalent to \x09 and \cI.

\v

Matches a vertical tab character and is equivalent to \x0b and \cK.

\cx

Matches control characters indicated by x. For example, if \cM matches Control-M or a carriage return character, the value of x must be between A–Z or a–z. Otherwise, the c character indicates c itself.

{n}

The value n is a non-negative integer and refers to the number of matching times. For example, o{2} does not match o in Bob, but matches o in food.

{n,}

The value n is a non-negative integer and refers to the minimum number of matching times. For example, o{2,} does not match o in Bob but matches all o in foooood. o{1,} is equivalent to o+, and o{0,} is equivalent to o*.

{n,m}

The values n and m (n≤m) are non-negative integers, where n refers to the minimum number of matching times and m refers to the maximum matching times. For example, o{1,3} matches the first three os in fooooood, and o{0,1} is equivalent to o?. Note that a space cannot be inserted between commas and digits. For example, ba{1,3} can match ba, baa, or baaa.

x|y

Matches x or y. For example, z|food matches z or food, and (z|f)ood matches zood or food.

[xyz]

Refers to a character set that matches any characters included. For example, [abc] matches a in plain.

[^xyz]

Refers to a reverse character set that matches any characters except xyz. For example, [^abc] matches p in plain.

[a-z]

Refers to a character range and matches any characters in the specified range. For example, [a-z] matches any lowercase letters from a to z.

[^a-z]

Refers to a reverse character range and matches any characters not in the specified range. For example, [^a-z] does not match any characters from a to z.

( )

Defines expressions between ( and ) as group and save characters that match the expression to a temporary area. A maximum of nine characters can be saved in a regular expression, and these characters can be referenced by symbols \1 to \9.

(pattern)

Matches pattern and captures sub-expressions of the match. You can use the $0–$9 attribute to retrieve captured matches from the result match set.

(?:pattern)

Matches pattern but does not capture sub-expressions of the match. That is, it is a non-capture match and does not store matches for future use. This is useful for the or character combined with (|). For example, industr(?:y|ies) is a simpler expression than industry|industries.

(?=pattern)

Refers to a non-capture match and indicates a forward positive pre-check, searching character strings at the start position of any character strings that match pattern. There is no need to capture the match for future use. For example, "Windows(?=95|98|NT|2000)" matches "Windows" in "Windows2000", but does not match "Windows" in "Windows3.1". A pre-check does not consume characters. That is, after a match occurs, the next search starts immediately, instead of starting from pre-checked characters.

(?!pattern)

Refers to a non-capture match and indicates a forward negative pre-check, searching character strings at the start position of any character strings that do not match pattern. There is no need to capture the match for future use. For example, "Windows(?=95|98|NT|2000)" matches "Windows" in "Windows3.1", but does not match "Windows" in "Windows2000".

To match special characters, add \ before the special characters. For example, to match the following special characters: ^, $, (), [], {}, ., ?, +, *, and |, use \^, \$, \ (, \), \ [, \], \{, \}, \., \?, \+, \*, and \|.