Regex Subexpressions

Sometimes we want to split our regex up we can do this with subexpressions – also referred to as groups. Subexpressions allow us to pull out specific sections of text (for example just the domain name from a website URL) or look for repetitions of a pattern.

We can specify a group to match with parentheses – (). Whatever is in the parentheses is our subexpression to compare against.

(foobar) Capture group with a subexpression of 'foobar'.

To match a certain number of repetitions of a group, append a quantifier:

(foo){n} Match n repetitions of 'foo'.


We want to match IP (version 4) addresses in a hosts file. We won't worry about checking that the octets fall between 0 and 255 though.

Text to Search In          localhost        desktop        server

Regular Expression


(\d{1,3}.){3} Look for 1 to 3 digits followed by a dot repeated 3 times.

\d{1,3} Look for a final number made up of 1 to 3 digits.

Output localhost desktop server


We can use alternation within a regular expression (or subexpression) to say “match on this OR that”. We use the vertical bar, or pipe symbol, (|) to delineate the two parts of our OR statement. We can also use it to match one subexpression or another.

x | y Match x or y.


We want to match IP (version 4) addresses which match with 10.x.x.x or 192.168.x.x.

Text to Search In          localhost        desktop          server            google

Regular Expression


Output localhost desktop server google


We can use backreferences in regular expressions to refer back to a capture group. Backreferences look for a repetition of the actual text that matched the group. For example, we could use it in a spell checker to make sure that an author doesn't accidentally repeat the same word. Capture groups are referred to numerically in order (the first group is 1, the second is 2 etc. etc.).

To use a backreference, we use a back slash followed by the group number – e.g. \1 would refer to the first capture group.

\n Backreference to group n.


In Markdown syntax, enclosing a phrase in double asterisks or double underscores applies the 'strong' HTML tag. We want a regular expression to highlight instances where this occurs. We can use a backreference to make sure that we don't match cases which start with asterisks and end with underscores or vice versa.

Text to Search In

**Apply Strong** don't apply strong __apply strong __normal text __not correct strong syntax**

Regular expression


(**|__) Capture group one – match a double asterisk or double underscore.

.+? Match one or more of any character, lazily (?).

\1 Look for another instance of whatever matched against capture group 1.


**Apply Strong** don't apply strong __apply strong __ normal text __not correct strong syntax.**


We can more precisely locate the text we are looking for by using 'lookaround'. It lets us look ahead or look behind for a match without actually returning that match. Here we will focus on positive lookahead and positive lookbehind, but there is also a negative syntax which can be used. Check your implementation for what lookaround behaviour is supported.

If we want to look for some text and then match something which precedes it, we use a lookahead – a capture group with a question mark (?) equals (=) followed by the subexpression that we want to match (?=xyz).

foo(?=bar) Match on 'foo', where it is followed by 'bar'

If we want to match on something that follows a pattern then we use a lookbehind – we are checking the characters the come before the pattern we want to match and return. We specify this in a capture group with a question mark (?), less than (<) and equals (=) followed by the subexpression to look behind for (?<=xyz).

(?<=foo)bar Match on 'bar', where it is preceded by foo.


We want to return the comments from a C program. They have been written using the double forward-slash (//) notation. We don't want to include the slashes at the start in our result though.

Text to Search In

this is the start of the code
// This is a comment
this is some code
this is some more code

Regular Expression

(?<=// ).+

Note that in some implementations forward slash has special meaning and will need to be escaped with a backslash.


this is the start of the code

// This is a comment

this is some code

this is some more code

Non-Capturing Groups

Often the data that we are interested in will be a subset of what we want to match on. We can include groups which we want to use as part of the pattern to match but which we don't want to be returned as it's own group. We do this by using a capture group of ?: followed by the subexpression to look for.

(?:foo) Match on 'foo' but don't return it as a group.


We have a list of domains which include 'www.' at the start, and we just want to return the domain without the 'www.'.

Text to Search In

Regular Expression



Group 1:

Full Match:


Learn, Build and Test RegEx


Prerequisite Skills


Matching a Certain Number of Repetitions with Regex

Often we want to match a specific type of character multiple times. For example, if we're validating a phone number, we might want to look…

Related Training Courses


Learn Regular Expressions

Regular expressions (regex) provide a flexible way of searching for searching using many different tools and programming languages.