Help Center/ CodeArts PerfTest/ User Guide/ Reference/ Regular Expression Metacharacters

Updated on 2023-07-13 GMT+08:00

View PDF

Regular Expression Metacharacters

**Table 1** Metacharacter description
Metacharacter	Description
.	Matches any characters except \n. If \n needs to be included, use other modes such as [\s\S].
^	Matches the start position of an input character string and does not match any characters. Use \^ to match the character itself.
$	Matches the end position of an input character string and does not match any characters. Use \$ to match the character itself.
*	The preceding characters or sub-expressions are matched zero or more times. * is equivalent to {0,}. For example, *\^b can match b, ^b, ^^b**, and so on.
+	Matches preceding characters or sub-expressions one or more times, equivalent to {1}. For example, a+b can match ab, aab, abb, aaab, and so on.
?	Matches preceding characters or sub-expression zero times or once, equivalent to {0,1}. For example, a[cd]? can match a, ac, and ad. When this character follows any other qualifier such as , +, ?, {n}, {n,}, or {n,m}, the matching mode is non-greedy. Non-greedy* mode matches the shortest possible searched character strings, and the default greedy mode matches the longest possible searched character strings. For example, the character string oooo, o+? matches only a single o, while o+ matches all o.
\|	The logic or is performed on two matching conditions. For example, the regular expression (him\|her) matches it belongs to him and it belongs to her, but cannot match it belongs to them.
\	Marks the next character as a special character, text, reverse reference, or octal escape character. For example, n matches the character n, \n matches the newline character, \\ matches \, and \( matches (.
\w	Matches a letter, digit, or underscore (_).
\W	Matches any character that is not a letter, digit, or underscore (_).
\s	Matches any blank character, such as a space, tab character, or form feed. It is equivalent to [ \f\n\r\t\v].
\S	Matches any character except blank characters and is equivalent to [^\f\n\r\t\v].
\d	Matches any digit and is equivalent to [0–9].
\D	Matches any non-digital character and is equivalent to [^0-9].
\b	Matches a word boundary (the position between a word and a space) and does not match any characters. For example, er\b matches er in never but does not match er in verb.
\B	A non-word boundary match. For example, er\B matches er in verb but does not match er in never.
\f	Matches a form feed and is equivalent to \x0c and \cL.
\n	Matches a linefeed and is equivalent to \x0a and \cJ.
\r	Matches a carriage return character and is equivalent to \x0d and \cM.
\t	Matches a tab character and is equivalent to \x09 and \cI.
\v	Matches a vertical tab character and is equivalent to \x0b and \cK.
\cx	Matches control characters indicated by x. For example, if \cM matches Control-M or a carriage return character, the value of x must be between A–Z or a–z. Otherwise, the c character indicates c itself.
{n}	The value n is a non-negative integer and refers to the number of matching times. For example, o{2} does not match o in Bob, but matches o in food.
{n,}	The value n is a non-negative integer and refers to the minimum number of matching times. For example, o{2,} does not match o in Bob but matches all o in foooood. o{1,} is equivalent to o+, and o{0,} is equivalent to o*.
{n,m}	The values n and m (n≤m) are non-negative integers, where n refers to the minimum number of matching times and m refers to the maximum matching times. For example, o{1,3} matches the first three os in fooooood, and o{0,1} is equivalent to o?. Note that a space cannot be inserted between commas and digits. For example, ba{1,3} can match ba, baa, or baaa.
x\|y	Matches x or y. For example, z\|food matches z or food, and (z\|f)ood matches zood or food.
[xyz]	Refers to a character set that matches any characters included. For example, [abc] matches a in plain.
[^xyz]	Refers to a reverse character set that matches any characters except xyz. For example, [^abc] matches p in plain.
[a-z]	Refers to a character range and matches any characters in the specified range. For example, [a-z] matches any lowercase letters from a to z.
[^a-z]	Refers to a reverse character range and matches any characters not in the specified range. For example, [^a-z] does not match any characters from a to z.
( )	Defines expressions between ( and ) as group and save characters that match the expression to a temporary area. A maximum of nine characters can be saved in a regular expression, and these characters can be referenced by symbols \1 to \9.
(pattern)	Matches pattern and captures sub-expressions of the match. You can use the $0–$9 attribute to retrieve captured matches from the result match set.
(?:pattern)	Matches pattern but does not capture sub-expressions of the match. That is, it is a non-capture match and does not store matches for future use. This is useful for the or character combined with (\|). For example, industr(?:y\|ies) is a simpler expression than industry\|industries.
(?=pattern)	Refers to a non-capture match and indicates a forward positive pre-check, searching character strings at the start position of any character strings that match pattern. There is no need to capture the match for future use. For example, "Windows(?=95\|98\|NT\|2000)" matches "Windows" in "Windows2000", but does not match "Windows" in "Windows3.1". A pre-check does not consume characters. That is, after a match occurs, the next search starts immediately, instead of starting from pre-checked characters.
(?!pattern)	Refers to a non-capture match and indicates a forward negative pre-check, searching character strings at the start position of any character strings that do not match pattern. There is no need to capture the match for future use. For example, "Windows(?=95\|98\|NT\|2000)" matches "Windows" in "Windows3.1", but does not match "Windows" in "Windows2000".

To match special characters, add \ before the special characters. For example, to match the following special characters: ^, $, (), [], {}, ., ?, +, *, and |, use \^, \$, \ (, \), \ [, \], \{, \}, \., \?, \+, \*, and \|.