Studying note for regular expression
Definition
Regular expression describes a pattern of matching string
Can be used to detect/replace/take out specific substring
Syntax
| symbol | description | example |
|---|---|---|
| ^ | match the beginning of he line | |
| [ABC] | match all char in […] | [aeiou]: google runoob taobao |
| [^ABC] | match all except those in […] | |
| [A-Z] | describe an interval | [a-z] matchs all lower case char |
| (…) | set groups | |
| \1…\n | match the same elements as nth group | |
| {n} | the front element repeats n times | |
| {n,} | the front element repeats at least n times | |
| {n,m} | the front element repeats at least n and at most m times | |
| . | match any char except (\n,\r), same with [^\n\r] | |
| \s\S | match all, \s:match all space char, \S:match all non space char, ‘return’ not included | |
| \w | match letter,number,underline, equal to [A-Za-z0-9 ] | |
| \cx | match the control char indicated by x(A-Z/a-z) | \cM: Control-M or return char |
| \f | match a page change char | |
| \n\r | match a return symbol | |
| \t | match a tab | |
| \v | match a vertical tab | |
| $ | match the end of the string, to match $ itself, use $ | |
| * | match the front subexpression multiple or zero times, use \* to match * | |
| + | match the front subexpression multiple or one times, use \+ | |
| . | match any single char except \n | |
| [ | mark the beginning of a []expression | |
| ? | match the front subexpression one or zero times, or indicate a non-greedy qualifier | |
| | | logic or |
Application in Cpp
In cpp, we use std::regex to express regular expression, supporting ECMAScripts as default.
Match
Use regex_match() to match xml (or html) format:
1 | std::regex reg("<.*>.*</.*>"); |
Search
Use std::regex_search.
As long as there exists targets in the string, it will return.
1 | std::regex reg("<(.*)>(.*)</(\\1)>"); |



