Often we want to match a certain type of character multiple times. For example, if we’re validating a phone number we might want to match a pattern of 10 digits without writing \d ten times. Furthermore we might want a flexible number of repetitions such as seeing a certain character (or class of characters) repeated between x and y times. We can use regex quantifiers to do this.

Zero or more times

Sometimes we want the flexibility in a regex to match regardless of whether a certain character class is there or not. We can do this appending the 0 or more quantifier to the class which is an asterisk *.

* Match the preceding character zero or more times

Example

You label errors with the word ERROR which is sometimes followed by an alphanumeric description of the problem. Create a regex to highlight ERROR lines.

Text to Search In

This line is ok

ERROR

ERROR with a description 123

This line is also ok

Regular Expression

ERROR[\w ]*

Match ‘ERROR’ followed by any number of spaces, letters or numbers.

Output

This line is ok

ERROR

ERROR with a description 123

This line is also ok

At least one time

When validating forms, we often want to check that a user has entered something, but we might not care how long that it. Or we may expect to see 1 or more characters after a given prefix or between two other characters. To do this, we append a plus to what we want to match.

+ Match the preceding character one or more times.

Example

You want to do basic validation of an email address – confirm it is some alphanumeric characters followed by the ‘at’ symbol followed by more characters a ‘dot’ followed by some lower case letters. Note that this isn’t perfect but it demonstrates the use of this quantifier!

Text to Search In

[email protected]

[email protected]

[email protected]

aliceatexample.com

Regular Expression

\[email protected]\w+.[a-z]+

Note the escaping of the dot.

Output

[email protected]

[email protected]

[email protected]

aliceatexample.com

Zero or once

There is also a quantifier for matching something just zero or one time. We do this by appending a question mark to the character, or class that we want to match.

? Match the preceding character 0 or 1 time.

Example

You want to match a phone number which may or may not begin with a plus.

Text to Search In

012345678

+112345678

Regular Expression

[+0]\d*

Output

012345678

+112345678

Repeating a given number of times

We can specify the number of times a certain pattern should be repeated – for example we want a field to contain an exact number of characters. Other times, we may with to match a number of repetitions in a given range / interval – for example ensuring that a phone number is between 7 and 15 digits. Finally, we might want to specify lower or upper bounds to how many times we expect to see a certain character – at least ‘n’ times or no more than ‘n’ times. We do this by specifying the number of matches in ‘curly braces’ after the character that we want to match.

{n} Exactly ‘n’ times.

{m,n} Between ‘m’ and ‘n’ times (inclusive).

{n,} At least n times.

Example

You want to check that the domain portion of an email address is between 2 and 63 characters.

Text to Search In

[email protected]

[email protected]

[email protected]

[email protected]

Regular Expression

\w*@\w{2,63}.[a-z.]{2,20}

\w* Any number of alphanumeric characters.

@ An at symbol

\w{2,63} Between 2 and 63 alphanumeric characters

. A dot/period/full stop

[a-z.] {2,20} Between 2 and 20 characters which are lowercase letters or periods.

Note the escaping of the plus. Depending on the configuration of our regex engine, this

Output

[email protected]

[email protected]

[email protected]

[email protected]

Greedy vs Lazy Regex Quantifiers

When looking for a match, there may be more than one way of matching that pattern. Some quantifiers are greedy and will try to match the longest possible string whilst others are lazy and just find the smallest matches.

Greedy Matching: match as many characters as possible.

Lazy Matching: match as few characters as possible.

Sometimes we want to match greedily but other times we might need to force part of our regex to match lazily. We can make a greedy quantifier lazy by appending a question mark.

*? Lazy version of matching zero or more characters (*)

+? Lazy version of matching one or more character (+)

{n,}? Lazy version of matching n or more characters (*{n,})

Greedy vs Lazy Example

In markdown syntax, we can mark some text for emphasis by putting it between underscores. In this example, we want to pull out all emphasized text.

Text to Search In

_emphasize this_ don't emphasize this _emphasize this_

Greedy Regular Expression

_.+_

_ an underscore

.+ one or more of any character

_ another underscore

Greedy Output

_emphasize this_ don't emphasize this _emphasize this_

Lazy Regular Expression

_.+?_

_ an underscore

.+? one or more of any character, matched lazily (match the minimum number of characters possible)

_ another underscore

Lazy Output

_emphasize this_ don't emphasize this _emphasize this_

Other Related Skills

Subexpressions let us split a regular expression up into smaller groups which we can use for many things!
We can specify where we want to match with a regular expression using word and string boundaries.