i-Lab Guide To: Regular Expressions

Regular expressions are widely used for example in XML schema and Linux commands such as sed and awk.

Syntax

Description

^ Indicates the beginning of a tring - e.g. ^a will only match 'a' at the beginning of a string
& Indicates the end of a tring - e.g. a$ will only match 'a' at the end of a string
. Matches a single character
+ Matches one or more instances
* Matches zero or more instances
? Matches zero or one instances
{n} Matches n instances
{n,} Matches n or more instances
{n,m} Matches between n and m instances
(ab) Brackets group a sequence of letters - e.g. (ab)+ for "ababab"
a(b|c) | indicates a choice - e.g. "ab" or "ac"
[abc] Indicates a set of allowed choices
[^abc] Indicates a set of disallowed choices
[a-z] Indicates a set as a range of characters
\ Escapes one of the above special characters - e.g. \* will indicates a literal asterisk
\w Matches any word character (equivalent to [a-zA-Z_])
\s Matches any whitespace character (tabs and spaces)
\d Matches any digit (equivalent to [0-9])
\l Matches any lower case letter (equivalent to [a-z])
\u Matches any upper case letter (equivalent to [A-Z])

© 2006 i-Lab Limited