Custom Sets in Regular Expressions

Sets of Characters in Regex

Sometimes the predefined sets, or classes, of characters – alphanumeric \w, digit \d and whitespace \w - aren’t specific enough for the case that we want to match. For example, what if we need to match just lowercase letters or only vowels. Where we need to match a more specific patter, we can define our own sets using square brackets []. Include all of the characters that you do want to match inside the square brackets. This acts as an OR – the regular expression will hit on any character inside the square brackets.

Example

Text to Search In

abcdefghijk

Regular Expression

[aeiou]

Output

abcdefghijk

Character Ranges

To make life easier, we can specify the set/class of characters to match on by using a range – indicated with a dash (-). We don’t have to match a ‘full’ range, such as the whole alphabet. We could write a regex to check for only the first half of the alphabet [a-m] or a specific range of numbers [3-8]. Common ranges used in regular expressions are shown below – note that some are equivalent to the standard character classes we have previously defined.

[a-z] Match All Lowercase Letters.

[A-Z] Match All Uppercase Letters.

[a-zA-Z] Match All Letters (uppercase and lowercase).

[0-9] Match All Numbers (equivalent to \d).

[a-zA-Z0-9] Match Letters or Numbers (equivalent to \w).

Complex Sets

We can build up complex sets of characters to match on, depending on our needs. We can even include the predefined classes (such as \d) in a custom set. For example, we could have a class which matches on any uppercase letter, any digit, the lowercase letters e and l or the ampersand sign.

Example

Text to Search In

Hello & Welcome 123456789

Regular Expression

[A-Z\del&]

Output

Hello & Welcome 123456789

Not a Specific Set

Finally, we can choose to match on the inverse of a particular class of characters which we define. To do this, prefix the entire class with the caret (^) symbol. It is essential to place the caret at the beginning (just after the left-hand square bracket) - if you put it anywhere else then it will just be treated as another character to match. Some common examples are shown below.

[^a-z] Match any character which isn’t a lowercase letter.

[^A-Z] Match any character which isn’t an uppercase letter.

[^0-9] Match any character which isn’t a digit (equivalent to \D).

[^a-zA-Z0-9] Match any character which isn’t a letter or number (equivalent to \W).

Putting it All Together

We can use all of the things that we have learnt in combination to create very specific regular expressions which will only match on the strict criteria that we specify.

Example

Find instances where there is a vowel after an ‘r’ or ‘R’.

Text to Search In

Roses are red, violets are blue.

Regular Expression

[rR][aeiou]

Output

Roses are red, violets are blue.

References

Learn, Build and Test RegEx

RegExr

Prerequisite Skills

lesson

Basic Regex Character Matching

A Simple Introduction We can perform a basic search with just a string of letters, numbers as shown in the following examples. Example…

Follow On Cyber Learning

lesson

Matching a Certain Number of Repetitions with Regex

Use regular expressions to match on a certain number (or range) of repetions of a certain character or class of characters.

Related Training Courses

course

Learn Regular Expressions

Regular expressions (regex) provide a flexible way of searching for searching using many different tools and programming languages.