Friday, March 29, 2024

Date Parsing using JavaScript and Regular Expressions

Date Parsing using JavaScript and Regular Expressions

I recently wrote a script to parse emails on my online account for dates so that it could create reminders for article due dates. The experience made me realize the many difficulties inherent to converting a date string into a proper Date object. JavaScript’s Date(string) constructor can be terribly unforgiving. In fact, a little thing like using hyphens instead of forward slashes as your date part separator can make the difference between success and failure! For those reasons, today we’re going to talk about what JavaScript’s Date constructor and parse() methods look for in a date string so that your scripts will have the very best chance of succeeding.

Parsing Date Strings using the Date Object

Both the Date(string) constructor and parse() method work on exactly the the same date formats. The difference is that the constructor creates a Date object, while the static Date.parse() method returns a number – more precisely, the number of milliseconds since Jan 1, 1970:

var d1 = new Date("March 1, 2013");
console.log(d1);        //Fri Mar 1 00:00:00 EST 2013 
console.log(typeof d1); //object

var d2 = Date.parse("March 1, 2013");
console.log(d2);        //1332302400000 
console.log(typeof d2); //number

Either of the above will also work for numeric date formats, assuming that you’re dealing with a supported format, such as yyyy/MM/dd, yyyy/M/d, yyyy/MM/dd hh:mm, or yyyy/mm/dd hh:mm:ss. Aside from that short list, most other date formats – with the notable exception of long date formats like Mon, January 1, 2000, which make excellent candidates for string parsing – will result in unpredictable results at best. Oddly, according to Wikipedia, the standard Calendar date representation allows both the YYYY-MM-DD and YYYYMMDD formats, as well as the year-month-only YYYY-MM format. However, trying to parse a date in the YYYY-MM-DD format using JavaScript raises errors in some browsers:

console.log(Date.parse('2009-7-12')); //results in NaN in IE
console.log(new Date('2009-7-12'));   //results in 'Invalid Date' in Firefox

The culprit is the hyphens. JS will accept the same formats with forward slashes (/).

Here‘s a detailed breakdown of cross-browser JavaScript Date parsing behavior.

Regular Expressions to the Rescue

If I may be so bold as to posit that JavaScript’s native date parsing capabilities are somewhat lacking, I’d like to suggest a general approach for dealing with date strings.

The optimal way to create a date is to use the new Date(year, month, day, hours, minutes, seconds, milliseconds) constructor. From there you can always call getMilliseconds() or getUTCMilliseconds() to get the number of milliseconds since midnight Jan 1, 1970. “All” we have to do is parse our date string into each of its constituent parts for the constructor. It bears mentioning that the better you know what date format(s) you’ll be working with, the easier it’ll be to parse them.

Let’s start with a simple example where a sentence contains a date in a JS-supported format. In that case, we can use the string constructor, so long as we can extract the date portion from the surrounding text.

var stringToParse = "You have a doctor's appointment on 2012/03/13 16:00.  Please show up on time.";
var dateString    = stringToParse.match(/d{4}/d{2}/d{2}s+d{2}:d{2}/);
var dt            = new Date(dateString);
console.log(dt); //prints "Tue March 13 16:00:00 EDT 2012"

Using the Full Date Constructor

For greater flexibility and control over your dates, use submatches to divide the date parts and pass them to the full date constructor. Here is the same example as above using the new approach. Keep in mind when using this constructor that months are zero based, so you must subtract one from the parsed number. Also, the first element of the matched date parts will contain the full date. That element can be discarded as we only want the parts. You can either pass each of them to the constructor or use Function.apply() which accepts an array as the parameters.

var stringToParse = "You have a doctor's appointment on 2012/03/13 16:00.  Please show up on time.";
var dateParts     = stringToParse.match(/(d{4})/(d{2})/(d{2})s+(d{2}):(d{2})/);
dateParts[2] -= 1; //months are zero-based
//get rid of the whole string match (element 0) and pass the rest of the array to Date.UTC()
var UtcDate = new Date(Date.UTC.apply(this, dateParts.slice(1)));

The Date.UTC() method is a good way to make sure that your date does not reflect your local time zone. It’s especially useful when working with global data.

Working With More Complex Dates

Perhaps the most difficult date formats to match are those that contain weekday and/or month names. Both have many permutations that you have to watch out for. These include the use of uppercase first letters, abbreviations, spaces, commas, and other separating characters, repeating name parts like “day” and “ember”, just to name a few. Here the onus is on you for writing a RegEx that is precise enough to suit your needs. I came up with a few of my own that worked for me. Your own may be more or less complex.

The first RegEx looks for a weekday to signify a date. The remainder of the pattern confirms a valid date by looking for a word (made up of letters only and three to nine characters in length) followed by the numeric day and year. Note the use of the b boundary delimiter on the weekday names and the (?: ) non-capturing parentheses:

var stringToParse = "assignment #1 due date: Tue, Mar 13, 2012.";
var dueDate       = stringToParse.match(/b(?:(?:Mon)|(?:Tues?)|(?:Wed(?:nes)?)|(?:Thur?s?)|(?:Fri)|(?:Sat(?:ur)?)|(?:Sun))(?:day)?b[:-,]?s*[a-zA-Z]{3,9}s+d{1,2}s*,?s*d{4}/);
var d = new Date(dueDate); 

Not too complicated, but effective!

Our next sample RegEx takes things a step further and carefully matches the month names as well. At the same time, it’s also a little more forgiving that the first RegEx in that it is case-insensitive due to the inclusion of the i flag:

/b(?:(?:Mon)|(?:Tues?)|(?:Wed(?:nes)?)|(?:Thur?s?)|(?:Fri)|(?:Sat(?:ur)?)|(?:Sun))(?:day)?b[:-,]?s*(?:(?:jan|feb)?r?(?:uary)?|mar(?:ch)?|apr(?:il)?|may|june?|july?|aug(?:ust)?|oct(?:ober)?|(?:sept?|nov|dec)(?:ember)?)s+d{1,2}s*,?s*d{4}/i

Conclusion

Unsurprisingly, there are many others who have tread the same path as yourself. The result was the creation of some highly flexible libraries for parsing date strings. In an up-coming article, we’ll examine some of the best ones. If one of these libraries fulfills your requirements, you’ll save yourself the hassle of writing your own parsing functions!

Rob Gravelle
Rob Gravelle
Rob Gravelle resides in Ottawa, Canada, and has been an IT guru for over 20 years. In that time, Rob has built systems for intelligence-related organizations such as Canada Border Services and various commercial businesses. In his spare time, Rob has become an accomplished music artist with several CDs and digital releases to his credit.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Popular Articles

Featured