Wednesday, May 22, 2024

Filter DOM Nodes Using a TreeWalker

One of the most important DOM operations is tree traversal. It’s one that is made more complicated by the wide range of possible node types — there are Text, Element, Comment and other special nodes, such as ProcessingInstruction or DocumentType. Most of them won’t have any childNodes, and then there are some which carry only a single piece of information. For instance, a Comment node only carries the specified comment string. That’s where the TreeWalker API comes in. Its job is to filter out and iterate through the nodes we want from a DOM tree. In fact, there are two such APIs: NodeIterator and TreeWalker. They’re quite similar in many ways, but with some notable differences. In today’s tutorial, we’ll learn how to use the TreeWalker, while the NodeIterator will be the subject of the next article.

Instantiating a TreeWalker

The TreeWalker object may be instantiated via its document.createTreeWalker() constructor. This method accepts four parameters and greatly simplifies tasks that would usually take many times more code using conventional methods. The syntax of the document.createTreeWalker() constructor is as follows:

document.createTreeWalker(root, nodesToShow, filter, entityExpandBol)

Here’s a brief description of each of the four parameters:

  • root: The root node to begin searching the document tree using.
  • nodesToShow: The type of nodes that should be visited by TreeWalker. It may be one of fifteen constants:
      *returns all element nodes
  • filter (or null): Reference to custom function (NodeFilter object) to filter the nodes returned. Enter null for none.
  • entityExpandBol: Boolean parameter specifying whether entity references should be expanded.

Using the above NodeFilter constants, we can filter all nodes in the document that are of a certain element type and carry a particular attribute.

A Couple of Instantiation Examples

Our first example will simply iterate over all elements within a DIV element:

<div id="main">
<p>This is a <span>paragraph</span></p>
<b>Bold text</b>
<script type="text/javascript">
var mainDiv = document.getElementById("main");
var walker  = document.createTreeWalker(mainDiv, NodeFilter.SHOW_ELEMENT, null, false);

Our second example is more complex, and only fetches non-empty textNodes:

var treeWalker = document.createTreeWalker(
  function(node) {
    return (node.nodeValue.trim() !== "") 
         ? NodeFilter.FILTER_ACCEPT 
         : NodeFilter.FILTER_REJECT;

Traversing the DOM Nodes

Having created a filtered list of nodes using document.createTreeWalker(), you can then process these filtered nodes using TreeWalker’s traversal methods:

  • firstChild(): Travels to and returns the first child of the current node.
  • lastChild(): Travels to and returns the last child of the current node.
  • nextNode(): Travels to and returns the next node within the filtered collection of nodes.
  • nextSibling(): Travels to and returns the next sibling of the current node.
  • parentNode(): Travels to and returns the current node’s parent node.
  • previousNode(): Travels to and returns the previous node of the current node.
  • previousSibling(): Travels to and returns the previous sibling of the current node.

Not to be confused with the standard DOM element methods, the above methods belong to the TreeWalker object exclusively for navigating through its filtered nodes.

Using the same DIV as above, let’s see how to use the traversal methods to walk through the returned nodes:

//Alert the starting node Tree Walker currently points to (root node)
//displays DIV (with id=main)
//Step through and alert all child nodes
while (walker.nextNode()) {
  //displays P, SPAN, and B.

//Go back to the first child node of the collection and display it
//to do that, we must reset TreeWalker pointer to point to main DIV
walker.currentNode = mainDiv; 
//displays P

As we step through each node using the traversal methods, true to its name, the TreeWalker does not only return each node, but travels to it. That’s why after each call to walker.nextNode(), we must reset the TreeWalker's position back to its root node before trying to retrieve the firstChild of the filtered collection:

//reset TreeWalker pointer to point to main DIV
walker.currentNode = mainDiv; 

This is necessary because, after running through the while loop, the TreeWalker’s pointer is directed at the very last node (B element) of the collection. Not only is there no firstChild, even if there were, it wouldn’t be the firstChild of the entire filtered collection, but rather the B element’s.


Iterating through the DOM tree is often necessary for DOM manipulation and node retrieval. The TreeWalker API offers one way to do that. If there is a downside to the TreeWalker API, it’s that tree structures are not as simple as 1-dimensional arrays. They can be mapped to 1-dimensional arrays, but that requires iterating over its structure, which is sort of redundant.

Rob Gravelle
Rob Gravelle
Rob Gravelle resides in Ottawa, Canada, and has been an IT guru for over 20 years. In that time, Rob has built systems for intelligence-related organizations such as Canada Border Services and various commercial businesses. In his spare time, Rob has become an accomplished music artist with several CDs and digital releases to his credit.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Popular Articles