Custom Sets in Regular Expressions

Lesson

Sets of Characters in Regex

Sometimes the predefined sets, or classes, of characters – alphanumeric \w, digit \d and whitespace \w - aren’t specific enough for the case that we want to match. For example, what if we need to match just lowercase letters or only vowels. Where we need to match a more specific patter, we can define our own sets using square brackets []. Include all of the characters that you do want to match inside the square brackets. This acts as an OR – the regular expression will hit on any character inside the square brackets.

Example

Text to Search In

abcdefghijk

Regular Expression

[aeiou]

Output

abcdefghijk

Character Ranges

To make life easier, we can specify the set/class of characters to match on by using a range – indicated with a dash (-). We don’t have to match a ‘full’ range, such as the whole alphabet. We could write a regex to check for only the first half of the alphabet [a-m] or a specific range of numbers [3-8]. Common ranges used in regular expressions are shown below – note that some are equivalent to the standard character classes we have previously defined.

[a-z] Match All Lowercase Letters.

[A-Z] Match All Uppercase Letters.

[a-zA-Z] Match All Letters (uppercase and lowercase).

[0-9] Match All Numbers (equivalent to \d).

[a-zA-Z0-9] Match Letters or Numbers (equivalent to \w).

Complex Sets

We can build up complex sets of characters to match on, depending on our needs. We can even include the predefined classes (such as \d) in a custom set. For example, we could have a class which matches on any uppercase letter, any digit, the lowercase letters e and l or the ampersand sign.

Example

Text to Search In

Hello & Welcome 123456789

Regular Expression

[A-Z\del&]

Output

Hello & Welcome 123456789

Not a Specific Set

Finally, we can choose to match on the inverse of a particular class of characters which we define. To do this, prefix the entire class with the caret (^) symbol. It is essential to place the caret at the beginning (just after the left-hand square bracket) - if you put it anywhere else then it will just be treated as another character to match. Some common examples are shown below.

[^a-z] Match any character which isn’t a lowercase letter.

[^A-Z] Match any character which isn’t an uppercase letter.

[^0-9] Match any character which isn’t a digit (equivalent to \D).

[^a-zA-Z0-9] Match any character which isn’t a letter or number (equivalent to \W).

Putting it All Together

We can use all of the things that we have learnt in combination to create very specific regular expressions which will only match on the strict criteria that we specify.

Example

Find instances where there is a vowel after an ‘r’ or ‘R’.

Text to Search In

Roses are red, violets are blue.

Regular Expression

[rR][aeiou]

Output

Roses are red, violets are blue.


References

Learn more about this topic by checking out these references.


Other Lessons

Learn more by checking out these related lessons

Basic Regex Character Matching

lesson

View

Matching a Certain Number of Repetitions with Regex

lesson

View

Courses

This lesson is part of the following courses.

Learn Regular Expressions

course

View