Friday, March 29, 2024

Write a Statement Parser in JavaScript

In the last few weeks, I’ve come to realize how a person could want to access an object’s private attributes and methods. I needed to run some unit tests on a script to make sure that it was error free. The trouble was that all the functionality that needed testing was private to a larger object. I sure had no intention of making them public just for testing’s sake, so I had to find a way to get at them. In my experience with Java, I had a decent amount of exposure to class reflection. That would certainly do the trick here; only problem was, JavaScript doesn’t natively support reflection. Not to be deterred, I set out to write my own reflection mechanism.

It was a lot harder to do than I thought (always is), but I did manage to put together a nifty string iterator for finding characters that are not within strings. Building on that, my class can now handle different kinds of enclosing characters besides quotes, including matched and non-matched pairs. In this article, I’d like to explain how it works. In a later instalment, we’ll see how that can be used to extract private members from an object.

Newlines as Statement Terminators

My parsing code does not (as of yet) treat newlines as a statement terminator because, as a general rule, I tend to use semi-colons to terminate each statement. It’s just ingrained in me after years of Java coding and I like the clarity and simplicity of it. Having said that, you may have perfectly valid reasons for omitting semi-colons at the end of your statements. The ECMAScript Language Specification web page has all the rules for Automatic Semicolon Insertion (ASI) that the JS parser uses so that you can identify coding elements without semi-colons. Here’s the class/object that I used to cut my parser’s teeth on:

var Person = function() {
    //defaults
    var _age  =  0,
        _name = 'John Doe';
   
    var socialSecurity = '444 555 666';
   
    //this is a global variable
    hatSize            = 'medium';
   
    this.initialize = function(name, age) {
      _name = _name || name;
      _age  = _age  || age;
    };
   
    if (arguments.length) this.initialize();
   
    //public properties. no accessors required
    this.phoneNumber = '555-224-5555';
    this.address     = '22 Acacia ave. London, England';
   
    //getters and setters
    this.getName     = function()      { return _name; };
    this.setName     = function (name) { _name = name; };
   
    //public methods
    this.addBirthday = function()      { _age++; };
    this.toString    = function()      { return 'My name is "+_name+" and I am "_age+" years old.'; };
};

Introducing the CodeSplitter

In order to parse code, I would have to add some characters to the String.Splitter class enclosingChars string besides single and double quotes, including forward slashes (/) for RegExp literals, as well as curly braces ({}) for object literals:

String.CodeSplitter = function(stringToSplit) {
  //init( stringToSplit, splitChar (statementTerminator), enclosingChars (ignore chars between these)
  this.init(stringToSplit,';', '//''""{}');
}
String.CodeSplitter.prototype = new String.Splitter();
String.CodeSplitter.prototype.constructor = String.CodeSplitter;
String.CodeSplitter.constructor = String.Splitter.prototype.constructor;

Matched versus Unmatched Enclosing Characters

Matched characters (quotes and forward slashes) have to be handled differently than unmatched ones (curly braces) because the latter can be nested. For that reason, a boolean variable that tests for whether or not the iterator is currently between a matched pair works just fine. Meanwhile, unmatched characters have to incremented and decremented as to ascertain the nesting level. Only when it is zero can we consider a semi-colon to mark the end of a statement:

if (strMatchedChars.indexOf(currentChar) > -1 ) {
  if ( bIncludeStrings
    && /['"]/.test(currentChar)
    //make sure that quote isn't escaped!
    && (charIndex == 0 || stringToSplit.charAt(charIndex-1) != '\')) {
    if (currentChar == "'" ) {
      withinSingleQuotedString = !withinSingleQuotedString;
    }
    else if (currentChar == '"' ) {
      withinDoubleQuotedString = !withinDoubleQuotedString;
    }
  }
  else {
    withinMatchedChars = !withinMatchedChars;
  }
} else {
  matchedCharIndex = strUnmatchedOpeningChars.indexOf(currentChar);
  if ( matchedCharIndex > -1 ) {
    //found an unmatched opening char
    aEnclosingLevelsCount[matchedCharIndex]++;
  }
  //test for match
  else if ...
   
}

Testing For the End of Statement

Having checked for both matched and unmatched enclosing characters, the last test is of course for the splitChar statement terminator (semi-colon). It’s the most complicated test of the lot because we not only have to confirm that the current character is the splitChar, but we also have to make sure that we are not within a single or double quoted string, or other matching characters. Last but not least, we have to check the level for each and every non-matched characters. Since anything other than zero is a deal breaker, I chose to join the array into a series of digits. Once converted into a string, it can be tested using a regular expression so that only zeros are accepted:

  //test for match
  else if (  currentChar == splitChar
          && !withinSingleQuotedString
          && !withinDoubleQuotedString
          && !withinMatchedChars
          && /^0+$/.test(aEnclosingLevelsCount.join(''))) {
    nextMatch = stringToSplit.substring(indexOfLastMatch+1, charIndex);
    indexOfLastMatch = charIndex;
    break;
  }
}

Running the Parser

To use the parser, we have to instantiate it with the code that we want to parse. It can be in the form of an object, function, or string. Then we can parse the code by calling the CodeParser’s split() method:

var codeParser = new String.CodeSplitter(Person);
var statements = codeParser.split();
for (var i=0; i<statements.length; i++) {
  document.writeln('<p>Statement '+(i+1)+': '+statements[i]+'</p>');
}

Statement 1: //defaults var _age = 0, _name = ‘John Doe’

Statement 2: var socialSecurity = ‘444 555 666’

Statement 3: //this is a global variable hatSize = ‘medium’

Statement 4: this.initialize = function(name, age) { _name = _name || name; _age = _age || age; }

Statement 5: if (arguments.length) this.initialize()

Statement 6: //public properties. no accessors required this.phoneNumber = ‘555-224-5555’

Statement 7: this.address = ’22 Acacia ave. London, England’

Statement 8: //getters and setters this.getName = function() { return _name; }

Statement 9: this.setName = function (name) { _name = name; }

Statement 10: //public methods this.addBirthday = function() { _age++; }

Statement 11: this.toString = function() { return ‘My name is “+_name+” and I am “_age+” years old.’; }

It’s working quite well, except that comments are not parsed out. We’ll be getting to that next time, as well as how to deal will code that does not end with a semicolon, such as nested (private) functions.

Here is a working demo of the CodeSplitter.

Rob Gravelle
Rob Gravelle
Rob Gravelle resides in Ottawa, Canada, and has been an IT guru for over 20 years. In that time, Rob has built systems for intelligence-related organizations such as Canada Border Services and various commercial businesses. In his spare time, Rob has become an accomplished music artist with several CDs and digital releases to his credit.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Popular Articles

Featured