Sets of Characters in Regex

Sometimes the predefined sets, or classes, of characters – alphanumeric \w, digit \d and whitespace \w - aren’t specific enough for the case that we want to match. For example, what if we want to match just lowercase letters or only vowels. Where we need to match a more specific patter, we can define our own sets using square brackets []. Simply include all of the characters that you do want to match inside the square brackets. This acts like an OR – the regular expression will hit on any one of the characters inside the square brackets.

Example

Text to Search In

abcdefghijk

Regular Expression

[aeiou]

Output

abcdefghijk

Character Ranges

To make life easier, we can specify the set/class of characters to match on by using a range – indicated with a dash (-). We don’t have to match a ‘full’ range such as the whole alphabet, we could write a regex to only match the first half of the alphabet [a-m] or a specific range of numbers [3-8]. Common ranges used in regular expressions are shown below – note that some are equivalent to the common character classes we have previously defined.

[a-z] Match All Lowercase Letters.

[A-Z] Match All Uppercase Letters.

[a-zA-Z] Match All Letters (uppercase and lowercase).

[0-9] Match All Numbers (equivalent to \d).

[a-zA-Z0-9] Match Letters or Numbers (equivalent to \w).

Complex Sets

We can build up complex sets of characters to match on, depending on our needs. We can even include the predefine sets (such as \d) in a custom set. For example, we could have a class which matches on any uppercase letter, any digit, the lowercase letters e and l or the ampersand sign.

Example

Text to Search In

Hello & Welcome 123456789

Regular Expression

[A-Z\del&]

Output

Hello & Welcome 123456789

Not a Specific Set

Finally, we can choose to match on the inverse of a certain class of characters which we define. To do this, prefix the entire class with the caret (^) symbol. It is important to place the caret at the beginning (just after the left-hand square bracket), if you put it any where else then it will just be treated as another character to match on. Some common examples are shown below.

[^a-z] Match any character which isn’t a lowercase letter.

[^A-Z] Match any character which isn’t an uppercase letter.

[^0-9] Match any character which isn’t a digit (equivalent to \D).

[^a-zA-Z0-9] Match any character which isn’t a letter or number (equivalent to \W).

Putting it All Together

We can use all of the things that we have learnt in combination to create very specific regular expressions which will only match on the strict criteria that we specify.

Example

Find instances where there is a vowel after an ‘r’ or ‘R’.

Text to Search In

Roses are red, violets are blue.

Regular Expression

[rR][aeiou]

Output

Roses are red, violets are blue.

Other Related Skills

Use regular expressions to match on a certain number (or range) of repetions of a certain character or class of characters.