Symbols representing a text pattern (which are interpreted by a regex processor)
The processor is used for matching, searching, and replacing text
Common flags: g = global, i = case insensitive, m = multiline
Regex engines are eager (try to give back a match asap) and greedy (they match as much as possible before giving control to the next expression part) where lazy matches match as little as possible
Metacharacters have special meaning, some are:
. any character except new line
\ escape next character
\t tab character
\r, \n, \r\n line returns
Character Set matches one of several characters
Ex. [aeiou] – matches any one vowel
Metacharacters inside character sets are already escaped
Exceptions: ] - ^ \
Character Ranges represent all characters between a range (the – is not literal only in a character set)
^ not any one of several characters (when in a character set)
Shorthand Character Sets
\d digit [0-9]
\w word character [0-9a-zA-Z_]
\s whitespace [\t\r\n]
\D not digit [^0-9]
\W not word [^0-9a-zA-Z_]
\S not whitespace [^ \t\r\n]
Repetition Metacharacters
* match preceding item zero or more times
+ match preceding item one or more times
? match preceding item zero or one time
Quantified Repetition Metacharacters
{ start quantified repetition of preceding item
} end quantified repetition of preceding item
ex. \d{4, 8} matches numbers with 4 to 8 digits
ex. \d{4} matches numbers with exactly 4 digits
ex. \d{4, } matches numbers with 4 or more digits
Lazy Expressions
? make preceding quantifier lazy (optional)
Grouping Metacharacters
( start grouped expression
) end grouped expression
Ex. (abc)+ matches abc and abcabcabc
_(expression)_ can capture a group for use in matching and replacing - (Capturing Group)
_(?:expression)_ is a Non-Capturing Group
Alternation Metacharacter
| match previous or next expression
Start and End Anchors
^ start of string/line
$ end of string/line
Word Boundaries
\b word boundary (start/end of word)
\B not a word boundary
Spaces are not word boundaries (the boundaries are on either side of the word)
Back References
\1 through \9 backreference for positions 1 to 9 (stored result of (expression))
Ex. <(i|em)>.+?</\1> matches <i>hello</i> and <em>Hello</em>
Assertions
Lookahead for match of expression but don’t include in match (Lookbehind not supported in JS)
Positive Lookahead – (?=regex)
Ex. sea(?=shore) matches “sea” in “seashore” but not “seaside”
Negative Lookahead – (?!regex)
Ex. sea(?!shore) matches “sea” in “seaside” but not “seashore”
Unicode Metacharacter
\u Matching for Unicode \u0065 where 0065 is the Unicode number