Thursday, September 19, 2024

Fine Tuning your RegEx Patterns

Fine Tuning your RegEx Patterns

There is nothing more frustrating than looking at a Regular Expression and trying to figure out why the heck it won’t work. I’ve spent hours scratching my head wondering where I went so horribly wrong. Thanks to the wonders of the Internet, I found a way to debug them quickly and effectively. It’s an online tool at regex101.com. In today’s article, I’ll show you how to use it to make writing RegExes painless and, dare I say it, kinda fun!

Regex101.com Overview

There are lots of other online RegEx testers out there, but regex101 is head and shoulders above the rest IMHO. That’s not to say that there aren’t better traditional software products available – for instance, RegexBuddy. But, if you’re looking for a tool with a clean and intuitive UI, then regex101 might just be the thing!

The main page itself is laid out very much like a traditional application with a lot of grey and panels. I haven’t familiarized myself with every feature but I have gotten a lot of use out of the RegEx tester and RegEx library.

Regex Tester

Let’s begin with the main focus of the app, the RegEx tester.

regex101.jpg

On the left, you’ll notice that there are three flavors of RegExes available: pcre (php), javascript, and python. Created in 1997, PCRE stands for Perl Compatible Regular Expressions and is a C library that attempts to closely mimic Perl’s regular expression functionality.

In the next panel, there is a REGULAR EXPRESSION textbox and flags. Note that the REGEX textbox may contain newlines. The textbox will expand vertically to accommodate more lines of text.

Under it is a TEXTAREA in which to place your search string – i.e, the one to match against. Below it, there is a SUBSTITUTION textarea in which to enter replaced text. Like the RegEx replace() function it accepts references to captured groups:

substritution_example.jpg

Moving on the the next panel, we have two of my favorite parts: the EXPLANATION and MATCH INFORMATION panes. The former provides detailed information about every part of your RegEx, including capturing groups, matching characters, quantifiers, and more. The latter shows the length, character indexes, and value of each capturing group. For example “1. [0-3] `rob`” would tell us that the first capturing group matched characters one to four (indexes are zero based, like arrays), and that the matched string’s value is `rob`.

The bottom panel in the same row contains a QUICK REFERENCE guide.

Regex Library

The next button beside the RegEx tester is the RegEx library. It maintains a collection of user-submitted RegExes for every conceivable purpose. People can vote them up and down; entries with the most votes show up at the top of the list. There is also a filter (on the left) and search box (above the list) to help you narrow down the choices. Regarding the filter, each flavor acts as a toggle. The color bar to the left of the label shows that the filter is activated for that flavor. The same color bar appears at the left of the selected item in the list. Clicking an item displays the RegEx in the textbox to the right of the list. Also notice that the RegEx flavor is displayed at the top right. Below the RegEx you’ll find a DESCRIPTION of what it does. To use the RegEx, just copy it from the textbox and paste it into your code. The title above the RegEx is also a link that brings up the RegEx in the RegEx tester when clicked.

regex_library.jpg

The Regex Tester In Action

I needed to call on regex101’s assistance in designing an Outlook VBA macro to create calendar items for article due dates. Whenever I sent an email to my editor, my macro would scan the contents of the email for article topics and their associated due dates. Due to a combination of inflexible matching and loose email structure, the macro was just as likely to miss the relevant content as catch it. So, I recently revamped the RegExes using regex101.

Here is what a typical topic proposal email looks like:

Dear editor,

I would like to propose the following topic ideas for September:

1.      Choosing a Good Wireless System, gear section, due date: Wed, Sep 9
2.      Weight Training 101, fitness section, due date: Wed, Sep 16
3.      Men's Sportcoats of Fall 2015, style section, due date: Wed, Sep 23

Let me know if any of those work for you.

Best regards,

Rob Gravelle

I am interested in the article titles and due dates; everything else is gravy.

Capturing the Article Title

It didn’t take long to realize that my first attempt was far too vague.

title_matcher_1st_attempt.jpg

The instant feedback that the regex101 tool provided made it apparent that I would need to capture from the start of the line using the “^” beginning of string anchor:

title_matcher_2nd_attempt.jpg

Capturing the Article Due Date

I didn’t want to rely on the end of string anchor for the due date because its position may vary. Instead, I looked for the “due date” label. That resulted in the following full RegEx, where “\b[a-z]{3}\s+\d{1,2}” identifies the three-character month and day:

^\d\.\s+(.+?),.+?\bdue\s+date\b.+?(\b[a-z]{3}\s+\d{1,2})

Success!

full_regex.jpg

Conclusion

I don’t know how I used to manage without the regex101 online tool. Receiving instant feedback and explanations for every character of my RegExes has proven to be helpful beyond belief. Count me in as a fan of regex101!

Rob Gravelle
Rob Gravelle
Rob Gravelle resides in Ottawa, Canada, and has been an IT guru for over 20 years. In that time, Rob has built systems for intelligence-related organizations such as Canada Border Services and various commercial businesses. In his spare time, Rob has become an accomplished music artist with several CDs and digital releases to his credit.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Popular Articles

Featured