Matching a Certain Number of Repetitions with Regex

Lesson

Often we want to match a specific type of character multiple times. For example, if we're validating a phone number, we might want to look for a pattern of 10 digits without writing \d ten times. Furthermore, we might want a flexible number of repetitions, such as seeing a given character (or class of characters) repeated between x and y times. We can use regex quantifiers to do this.

Zero or more times

Sometimes we want the flexibility in a regex to match regardless of whether a particular character class is there or not. We can do this appending the 0 or more quantifier to the class which is an asterisk *.

* Match the preceding character zero or more times

Example

You label errors with the word ERROR which is sometimes followed by an alphanumeric description of the problem. Create a regex to highlight ERROR lines.

Text to Search In

This line is ok

ERROR

ERROR with a description 123

This line is also ok

Regular Expression

ERROR[\w ]*

Match 'ERROR' followed by any number of spaces, letters or numbers.

Output

This line is ok

ERROR

ERROR with a description 123

This line is also ok

At least one time

When validating forms, we often want to check that a user has entered something, but we might not care how long that it. Or we may expect to see one or more characters after a given prefix or between two other characters. To do this, we append a plus to what we want to match.

+ Match the preceding character one or more times.

Example

You want to do basic validation of an email address – confirm it is some alphanumeric characters followed by the 'at' symbol followed by more characters and then a 'dot' followed by some lower case letters. Note that this isn't perfect, but it demonstrates the use of this quantifier!

Text to Search In

jemimah@thing

tom@123.456

email@example.com

aliceatexample.com

Regular Expression

\w+@\w+\.[a-z]+

Note the escaping of the dot.

Output

jemimah@thing

tom@123.456

email@example.com

aliceatexample.com

Zero or once

There is also a quantifier for matching something just zero or one time. We do this by appending a question mark to the character, or class that we want to match.

? Match the preceding character 0 or 1 time.

Example

You want to match a phone number which may or may not begin with a plus.

Text to Search In

012345678

+112345678

Regular Expression

[+0]\d*

Output

012345678

+112345678

Repeating a given number of times

We can specify the number of times a particular pattern should be repeated. For example, we want a field to contain an exact number of characters. Other times, we may with to match a number of repetitions in a given range/interval – for example, ensuring that a phone number is between 7 and 15 digits. Finally, we might want to specify lower or upper bounds to how many times we expect to see a specific character – at least 'n' times or no more than 'n' times. We do this by specifying the number of matches in 'curly braces' after the character that we want to match.

{n} Exactly 'n' times.

{m,n} Between 'm' and 'n' times (inclusive).

{n,} At least n times.

Example

You want to check that the domain portion of an email address is between 2 and 63 characters.

Text to Search In

abcd@e.com

abcde@example.org.uk

alice@example.com

bob@example.c8m

Regular Expression

\w*@\w{2,63}.[a-z.]{2,20}

\w* Any number of alphanumeric characters.

@ An at symbol

\w{2,63} Between 2 and 63 alphanumeric characters

. A dot/period/full stop

[a-z.] {2,20} Between 2 and 20 characters which are lowercase letters or periods.

Output

abcd@e.com

abcde@example.org.uk

alice@example.com

bob@example.c8m

Greedy vs Lazy Regex Quantifiers

When looking for a match, there may be more than one way of matching that pattern. Some quantifiers are greedy and will try to match the longest possible string whilst others are lazy and just find the smallest matches.

Greedy Matching: match as many characters as possible.

Lazy Matching: match as few characters as possible.

Sometimes we want to match greedily, but other times we might need to force part of our regex to match lazily. We can make a greedy quantifier lazy by appending a question mark.

*? Lazy version of matching zero or more characters (*)

+? Lazy version of matching one or more character (+)

{n,}? Lazy version of matching n or more characters ({n,})

Greedy vs Lazy Example

In markdown syntax, we can mark some text for emphasis by putting it between underscores. In this example, we want to pull out all the emphasized text.

Text to Search In

_emphasize this_ don't emphasize this _emphasize this_

Greedy Regular Expression

_.+_

_ an underscore

.+ one or more of any character

_ another underscore

Greedy Output

_emphasize this_ don't emphasize this _emphasize this_

Lazy Regular Expression

_.+?_

_ an underscore

.+? one or more of any character, matched lazily (match the minimum number of characters possible)

_ another underscore

Lazy Output

_emphasize this_ don't emphasize this _emphasize this_


References

Learn more about this topic by checking out these references.


Other Lessons

Learn more by checking out these related lessons

Custom Sets in Regular Expressions

lesson

View

Regex Anchors

lesson

View

Regex Subexpressions

lesson

View

Courses

This lesson is part of the following courses.

Learn Regular Expressions

course

View