Custom Sets in Regular Expressions
Lesson
Sets of Characters in Regex
Sometimes the predefined sets, or classes, of characters – alphanumeric \w, digit \d and whitespace \w - aren’t specific enough for the case that we want to match. For example, what if we need to match just lowercase letters or only vowels. Where we need to match a more specific patter, we can define our own sets using square brackets []. Include all of the characters that you do want to match inside the square brackets. This acts as an OR – the regular expression will hit on any character inside the square brackets.
Example
Text to Search In
abcdefghijk
Regular Expression
[aeiou]
Output
abcdefghijk
Character Ranges
To make life easier, we can specify the set/class of characters to match on by using
a range – indicated with a dash (-). We don’t have to match a ‘full’ range, such
as the whole alphabet. We could write a regex to check for only the first half
of the alphabet [a-m]
or a specific range of numbers [3-8]
. Common ranges used
in regular expressions are shown below – note that some are equivalent to the standard
character classes we have previously defined.
[a-z]
Match All Lowercase Letters.
[A-Z]
Match All Uppercase Letters.
[a-zA-Z]
Match All Letters (uppercase and lowercase).
[0-9]
Match All Numbers (equivalent to \d).
[a-zA-Z0-9]
Match Letters or Numbers (equivalent to \w).
Complex Sets
We can build up complex sets of characters to match on, depending on our needs.
We can even include the predefined classes (such as \d
) in a custom set. For example,
we could have a class which matches on any uppercase letter, any digit, the lowercase
letters e and l or the ampersand sign.
Example
Text to Search In
Hello & Welcome 123456789
Regular Expression
[A-Z\del&]
Output
Hello & Welcome 123456789
Not a Specific Set
Finally, we can choose to match on the inverse of a particular class of characters which we define. To do this, prefix the entire class with the caret (^) symbol. It is essential to place the caret at the beginning (just after the left-hand square bracket) - if you put it anywhere else then it will just be treated as another character to match. Some common examples are shown below.
[^a-z]
Match any character which isn’t a lowercase letter.
[^A-Z]
Match any character which isn’t an uppercase letter.
[^0-9]
Match any character which isn’t a digit (equivalent to \D).
[^a-zA-Z0-9]
Match any character which isn’t a letter or number (equivalent to \W).
Putting it All Together
We can use all of the things that we have learnt in combination to create very specific regular expressions which will only match on the strict criteria that we specify.
Example
Find instances where there is a vowel after an ‘r’ or ‘R’.
Text to Search In
Roses are red, violets are blue.
Regular Expression
[rR][aeiou]
Output
Roses are red, violets are blue.
References
Learn more about this topic by checking out these references.
Other Lessons
Learn more by checking out these related lessons
Courses
This lesson is part of the following courses.