Regex Anchors
Lesson
Sometimes we want to be more specific about what position in a word or string we would like to match with our regular expression. We can use anchors to do this by defining the location at which to match based on a word or string boundary.
Regex Word Boundaries
We can look just for matches relative to a word boundary using \b
. Regex
interprets a word boundary as a point with an alphanumeric (or underscore) character
to one side and a non-alphanumeric character to the other side. We can use this
to locate a match relative to the start or end of a word or to match only when
our text is the entire word.
\b
Word boundary.
We can also specify a positional match which is explicitly not on a word boundary
using \B
.
\B
Not a word boundary.
Example
We want to match occurrences of the word 'red', but we don't want to match when that combination of 'r', 'e' and 'd' appear as part of another word (e.g. occurred).
Text to Search In
Alice wondered whether she should paint the room red. Then it occurred to her that she didn't have any red paint.
Regular Expression
\bred\b
Without including the word boundaries, we would also match 'red' at the end of 'wondered' and 'occurred'.
Output
Alice wondered whether she should paint the room red. Then it occurred to her that she didn't have any red paint.
Match at the Start or End of a String
As well as using word boundaries, we can also look for a match relative to the start or end of the string we're searching. This is useful for constraining what we match on which can be helpful in large datasets or where character patterns are repeated a lot but we're just interested in those at the start or end of the text. For example, when looking for just the first tag in an HTML document.
We use the caret (^
) and dollar ($
) symbols for matching at the
start/end of a string respectively.
^
Start of the String
$
End of the String
Multiline Mode
Many regex engines allow us to use multiline mode – often activated with an 'm' flag. Multiline mode will treat separate lines as separate strings for searching. In this case, string boundaries will occur at the beginning and end of each line.
Example
We want to look for log files ending in '.log' which start with the year 2010. We have enabled multiline mode.
Text to Search In
20180402.log
20050301.log
20101211.log
20101211.txt
19920101.log
20101211.log.old
Regular Expression
^2010\w+.log$
Breakdown
^2010
Look for 2010 at the beginning of the string (line in multiline mode).
\w+
Then at least one alphanumeric character.
.log$
Expect '.log' at the end of the string (line in multiline mode).
Output
20180402.log
20050301.log
20101211.log
20101211.txt
19920101.log
20101211.log.old
References
Learn more about this topic by checking out these references.
Other Lessons
Learn more by checking out these related lessons
Courses
This lesson is part of the following courses.