Matching a Certain Number of Repetitions with Regex
Lesson
Often we want to match a specific type of character multiple times. For example,
if we're validating a phone number, we might want to look for a pattern of 10 digits
without writing \d
ten times. Furthermore, we might want a flexible number of
repetitions, such as seeing a given character (or class of characters) repeated
between x and y times. We can use regex quantifiers to do this.
Zero or more times
Sometimes we want the flexibility in a regex to match regardless of whether a particular
character class is there or not. We can do this appending the 0 or more quantifier
to the class which is an asterisk *
.
*
Match the preceding character zero or more times
Example
You label errors with the word ERROR which is sometimes followed by an alphanumeric description of the problem. Create a regex to highlight ERROR lines.
Text to Search In
This line is ok
ERROR
ERROR with a description 123
This line is also ok
Regular Expression
ERROR[\w ]*
Match 'ERROR' followed by any number of spaces, letters or numbers.
Output
This line is ok
ERROR
ERROR with a description 123
This line is also ok
At least one time
When validating forms, we often want to check that a user has entered something, but we might not care how long that it. Or we may expect to see one or more characters after a given prefix or between two other characters. To do this, we append a plus to what we want to match.
+
Match the preceding character one or more times.
Example
You want to do basic validation of an email address – confirm it is some alphanumeric characters followed by the 'at' symbol followed by more characters and then a 'dot' followed by some lower case letters. Note that this isn't perfect, but it demonstrates the use of this quantifier!
Text to Search In
jemimah@thing
tom@123.456
email@example.com
aliceatexample.com
Regular Expression
\w+@\w+\.[a-z]+
Note the escaping of the dot.
Output
jemimah@thing
tom@123.456
email@example.com
aliceatexample.com
Zero or once
There is also a quantifier for matching something just zero or one time. We do this by appending a question mark to the character, or class that we want to match.
?
Match the preceding character 0 or 1 time.
Example
You want to match a phone number which may or may not begin with a plus.
Text to Search In
012345678
+112345678
Regular Expression
[+0]\d*
Output
012345678
+112345678
Repeating a given number of times
We can specify the number of times a particular pattern should be repeated. For example, we want a field to contain an exact number of characters. Other times, we may with to match a number of repetitions in a given range/interval – for example, ensuring that a phone number is between 7 and 15 digits. Finally, we might want to specify lower or upper bounds to how many times we expect to see a specific character – at least 'n' times or no more than 'n' times. We do this by specifying the number of matches in 'curly braces' after the character that we want to match.
{n}
Exactly 'n' times.
{m,n}
Between 'm' and 'n' times (inclusive).
{n,}
At least n times.
Example
You want to check that the domain portion of an email address is between 2 and 63 characters.
Text to Search In
abcd@e.com
abcde@example.org.uk
alice@example.com
bob@example.c8m
Regular Expression
\w*@\w{2,63}.[a-z.]{2,20}
\w* Any number of alphanumeric characters.
@ An at symbol
\w{2,63} Between 2 and 63 alphanumeric characters
. A dot/period/full stop
[a-z.] {2,20} Between 2 and 20 characters which are lowercase letters or periods.
Output
abcd@e.com
abcde@example.org.uk
alice@example.com
bob@example.c8m
Greedy vs Lazy Regex Quantifiers
When looking for a match, there may be more than one way of matching that pattern. Some quantifiers are greedy and will try to match the longest possible string whilst others are lazy and just find the smallest matches.
Greedy Matching: match as many characters as possible.
Lazy Matching: match as few characters as possible.
Sometimes we want to match greedily, but other times we might need to force part of our regex to match lazily. We can make a greedy quantifier lazy by appending a question mark.
*?
Lazy version of matching zero or more characters (*)
+?
Lazy version of matching one or more character (+)
{n,}?
Lazy version of matching n or more characters ({n,})
Greedy vs Lazy Example
In markdown syntax, we can mark some text for emphasis by putting it between underscores. In this example, we want to pull out all the emphasized text.
Text to Search In
_emphasize this_ don't emphasize this _emphasize this_
Greedy Regular Expression
_.+_
_ an underscore
.+ one or more of any character
_ another underscore
Greedy Output
_emphasize this_ don't emphasize this _emphasize this_
Lazy Regular Expression
_.+?_
_ an underscore
.+? one or more of any character, matched lazily (match the minimum number of characters possible)
_ another underscore
Lazy Output
_emphasize this_ don't emphasize this _emphasize this_
References
Learn more about this topic by checking out these references.
Other Lessons
Learn more by checking out these related lessons
Courses
This lesson is part of the following courses.