Basic Regex Character Matching

A Simple Introduction

We can perform a basic search with just a string of letters, numbers as shown in the following examples.

Example 1

Text to Search In

123 h3llo world

Regular Expression

123

Output

123 h3llo world

Example 2

Text to Search In

123 h3llo world

Regular Expression

h3llo

Output

123 h3llo world

We can also match on whitespace, on its own or combined with letters/numbers.

Example 3

Text to Search In

123 h3llo world

Regular Expression

lo wo

Output

123 h3llo world

How Many to Match

Depending on what you are using to do your matching, the regular expression engine may hit on just the first match, or it might return them all. A tool like 'grep' might default to showing all matches whereas programming language implementations will often show the first match (or just return true) by default.

If your implementation is only showing a single match then there is likely a way to turn on global matching to get all matches – often denoted with a 'g' flag or similar. Here we will generally use global matching in the examples.

Predefined Regex Classes

Often, we may want our regex to match on any alphanumeric character, any numerical digit or any whitespace character (including spaces, tabs or newlines). We use a backslash as an 'escape character' in regular expressions which means that the character which follows it should be treated specially.

\w Any alphanumeric character (letters and numbers) as well as underscores ( _ )

\d Any digit (0 to 9)

\s Any whitespace (space, tab, line break)

Example 4

Text to Search In

123 h3llo world

Regular Expression

\w

Output

123 h3llo world

Example 5

We can combine all of the above to match on a combination of characters

Text to Search In

123 h3llo world

Regular Expression

\d\sh

Output

123 h3llo world

Not Alphanumeric, Not a Number or Not Whitespace

We can also ask to match based on not being a specific type of character. These should look familiar as the uppercase counterparts to the above expressions.

\W Not an alphanumeric character.

\D Not a digit.

\S Not whitespace.

Example 6

Text to Search In

123 h3llo world

Regular Expression

\D\s\D

Output

123 h3llo world

Whitespace Metacharacters in Regular Expressions

There are various types of whitespace such as newline, carriage return, tab and more. Sometimes we want to match just a particular kind of whitespace instead of matching any whitespace using \s. There are several different metacharacters we can use in our regular expression:

Space (' ')

To match a space with a regular expression, just use a space ' ' with the spacebar!

Other Types of Whitespace

\n Newline, line-break or line-feed

\b Backspace

\r Carriage Return

\t Tab

Any Character

Sometimes we want to match any character. This is done using a dot (.).

. Match on any character (except a newline)

Regex Escape Character

If the character we want to use has a special meaning, how can we search for it? For example, what if I want to match something with a full stop (.) in it? The answer is that we escape that character with a backslash. For example, to look for a dot/period/full stop, we use \. and to look for a backslash we just use another backslash \\ to escape the escape!

\ Escape the following character

Example 7

Imagine you wanted to find all files with a date in January 2020.

Text to Search In

20150105.txt

20170203.txt

20200104.txt

20200117.txt

20200321.txt

Regular Expression

202001..\.txt

Output

20150105.txt

20170203.txt

20200104.txt

20200117.txt

20200321.txt

References

Learn, Build and Test RegEx

RegExr

Prerequisite Skills

lesson

What are Regular Expressions?

Regular expressions are patterns specified with a defined syntax which are used to search through some text for specific sequences of…

Follow On Cyber Learning

lesson

Custom Sets in Regular Expressions

Learn to define your own sets of characters to be matched with a regular expression.

Related Training Courses

course

Learn Regular Expressions

Regular expressions (regex) provide a flexible way of searching for searching using many different tools and programming languages.