Write a Statement Parser in JavaScript

By Rob Gravelle


Desktop-as-a-Service Designed for Any Cloud ? Nutanix Frame

In the last few weeks, I've come to realize how a person could want to access an object's private attributes and methods. I needed to run some unit tests on a script to make sure that it was error free. The trouble was that all the functionality that needed testing was private to a larger object. I sure had no intention of making them public just for testing's sake, so I had to find a way to get at them. In my experience with Java, I had a decent amount of exposure to class reflection. That would certainly do the trick here; only problem was, JavaScript doesn't natively support reflection. Not to be deterred, I set out to write my own reflection mechanism.

It was a lot harder to do than I thought (always is), but I did manage to put together a nifty string iterator for finding characters that are not within strings. Building on that, my class can now handle different kinds of enclosing characters besides quotes, including matched and non-matched pairs. In this article, I'd like to explain how it works. In a later instalment, we'll see how that can be used to extract private members from an object.

Newlines as Statement Terminators

My parsing code does not (as of yet) treat newlines as a statement terminator because, as a general rule, I tend to use semi-colons to terminate each statement. It's just ingrained in me after years of Java coding and I like the clarity and simplicity of it. Having said that, you may have perfectly valid reasons for omitting semi-colons at the end of your statements. The ECMAScript Language Specification web page has all the rules for Automatic Semicolon Insertion (ASI) that the JS parser uses so that you can identify coding elements without semi-colons. Here's the class/object that I used to cut my parser's teeth on:

var Person = function() {
    var _age  =  0,
        _name = 'John Doe';
    var socialSecurity = '444 555 666';
    //this is a global variable
    hatSize            = 'medium';
    this.initialize = function(name, age) {
      _name = _name || name;
      _age  = _age  || age;
    if (arguments.length) this.initialize();
    //public properties. no accessors required
    this.phoneNumber = '555-224-5555';
    this.address     = '22 Acacia ave. London, England';
    //getters and setters
    this.getName     = function()      { return _name; };
    this.setName     = function (name) { _name = name; };
    //public methods
    this.addBirthday = function()      { _age++; };
    this.toString    = function()      { return 'My name is "+_name+" and I am "_age+" years old.'; };

Introducing the CodeSplitter

In order to parse code, I would have to add some characters to the String.Splitter class enclosingChars string besides single and double quotes, including forward slashes (/) for RegExp literals, as well as curly braces ({}) for object literals:

String.CodeSplitter = function(stringToSplit) {
  //init( stringToSplit, splitChar (statementTerminator), enclosingChars (ignore chars between these)
  this.init(stringToSplit,';', '//\'\'""{}');
String.CodeSplitter.prototype = new String.Splitter();
String.CodeSplitter.prototype.constructor = String.CodeSplitter;
String.CodeSplitter.constructor = String.Splitter.prototype.constructor;

Matched versus Unmatched Enclosing Characters

Matched characters (quotes and forward slashes) have to be handled differently than unmatched ones (curly braces) because the latter can be nested. For that reason, a boolean variable that tests for whether or not the iterator is currently between a matched pair works just fine. Meanwhile, unmatched characters have to incremented and decremented as to ascertain the nesting level. Only when it is zero can we consider a semi-colon to mark the end of a statement:

if (strMatchedChars.indexOf(currentChar) > -1 ) {
  if ( bIncludeStrings
    && /['"]/.test(currentChar)
    //make sure that quote isn't escaped!
    && (charIndex == 0 || stringToSplit.charAt(charIndex-1) != '\\')) {
    if (currentChar == "'" ) {
      withinSingleQuotedString = !withinSingleQuotedString;
    else if (currentChar == '"' ) {
      withinDoubleQuotedString = !withinDoubleQuotedString;
  else {
    withinMatchedChars = !withinMatchedChars;
} else {
  matchedCharIndex = strUnmatchedOpeningChars.indexOf(currentChar);
  if ( matchedCharIndex > -1 ) {
    //found an unmatched opening char
  //test for match
  else if ...

Testing For the End of Statement

Having checked for both matched and unmatched enclosing characters, the last test is of course for the splitChar statement terminator (semi-colon). It's the most complicated test of the lot because we not only have to confirm that the current character is the splitChar, but we also have to make sure that we are not within a single or double quoted string, or other matching characters. Last but not least, we have to check the level for each and every non-matched characters. Since anything other than zero is a deal breaker, I chose to join the array into a series of digits. Once converted into a string, it can be tested using a regular expression so that only zeros are accepted:

  //test for match
  else if (  currentChar == splitChar
          && !withinSingleQuotedString
          && !withinDoubleQuotedString
          && !withinMatchedChars
          && /^0+$/.test(aEnclosingLevelsCount.join(''))) {
    nextMatch = stringToSplit.substring(indexOfLastMatch+1, charIndex);
    indexOfLastMatch = charIndex;

Running the Parser

To use the parser, we have to instantiate it with the code that we want to parse. It can be in the form of an object, function, or string. Then we can parse the code by calling the CodeParser's split() method:

var codeParser = new String.CodeSplitter(Person);
var statements = codeParser.split();
for (var i=0; i<statements.length; i++) {
  document.writeln('<p>Statement '+(i+1)+': '+statements[i]+'</p>');

Statement 1: //defaults var _age = 0, _name = 'John Doe'

Statement 2: var socialSecurity = '444 555 666'

Statement 3: //this is a global variable hatSize = 'medium'

Statement 4: this.initialize = function(name, age) { _name = _name || name; _age = _age || age; }

Statement 5: if (arguments.length) this.initialize()

Statement 6: //public properties. no accessors required this.phoneNumber = '555-224-5555'

Statement 7: this.address = '22 Acacia ave. London, England'

Statement 8: //getters and setters this.getName = function() { return _name; }

Statement 9: this.setName = function (name) { _name = name; }

Statement 10: //public methods this.addBirthday = function() { _age++; }

Statement 11: this.toString = function() { return 'My name is "+_name+" and I am "_age+" years old.'; }

It's working quite well, except that comments are not parsed out. We'll be getting to that next time, as well as how to deal will code that does not end with a semicolon, such as nested (private) functions.

Here is a working demo of the CodeSplitter.

If you enjoyed this article, please contribute to Rob's less lucrative music career by purchasing one of Rob's cover or original songs from iTunes.com for only 0.99 cents each.

Rob Gravelle resides in Ottawa, Canada, and is the founder of GravelleConsulting.com. Rob has built systems for Intelligence-related organizations such as Canada Border Services, CSIS as well as for numerous commercial businesses. Email Rob to receive a free estimate on your software project. Should you hire Rob and his firm, you'll receive 15% off for mentioning that you heard about it here!

In his spare time, Rob has become an accomplished guitar player, and has released several CDs. His former band, Ivory Knight, was rated as one Canada's top hard rock and metal groups by Brave Words magazine (issue #92).

Rob uses and recommends MochaHost, which provides Web Hosting at $3.10 per month, 2 LifeTime Free Domains, and 6 Months Free!

  • Web Development Newsletter Signup

    Invalid email
    You have successfuly registered to our newsletter.
Thanks for your registration, follow us on our social networks to keep up-to-date