Friday, January 24, 2025

Parsing XML Feeds with jQuery

The jQuery JavaScript library’s DOM parsing, traversing, and manipulating abilities no doubt play a large role in its enduring popularity. These same abilities can be applied to XML documents as well. Combined with jQuery’s easy XML loading using Ajax, jQuery’s DOM prowess makes it an excellent choice for building dynamic, XML-based UI applications. In today’s article, we’ll put jQuery’s DOM processing power to parse address fields from an Atom feed in order to convert them into a different format.

Input and Output Formats

The example featured here is based on a real project that I recently participated in. The input data was retrieved from an Atom feed that contained information about restaurant locations. If you look closely at the following code, you’ll spot three address fields named Address1, Address2, and Address3. These were to be mapped to a “Building name/number”, “Street”, and “City” fields. This was not a trivial exercise because Address1 typically contained both the Building name/number and Street. Address2 was often blank, or contained a combination of Street, neighborhood, and/or city data. Finally, Address3 usually contained the City, but might also include other information, such as postal code, neighborhood, and other superfluous odds and ends. In the end, PHP would be handling the mapping duties, but jQuery gave me a way to try out various strategies while receiving instant feedback about their efficacy.

<?xml version="1.0" encoding="utf-8"?>
<feed xml_lang="en-us" xml_base="http://feeds.acme.com/ShopFeed.aspx?FeedName=ShopDetails" >
  <title type="text">GoodFoodTalks_OpenShopDetails</title>
  <id>uuid:3de9bab0-3fb8-4176-bcb9-59914c9b3362</id>
  <updated>2015-06-25T15:33:42Z</updated>
  <author>
    <name>acme co</name>
    <uri>http://www.acme.com</uri>
    <email>acme@acme.com</email>
  </author>
  <entry>
    <id>http://feeds.acme.com/ShopFeed.aspx?FeedName=ShopDetails&ItemID=2</id>
    <title>High Holborn, 29</title>
    <updated>2015-06-25T15:33:42Z</updated>
    <content type="application/xml">
      <ShopNumber><![CDATA[33]]></ShopNumber>
      <Name><![CDATA[High Holborn, 29]]></Name>
      <Lat><![CDATA[55.12345678]]></Lat>
      <Lon><![CDATA[-0.55500111]]></Lon>
      <PostCode><![CDATA[WH1V 7CU]]></PostCode>
      <Telephone><![CDATA[020 7932 5202]]></Telephone>
      <Address1><![CDATA[29 High Holborn]]></Address1>
      <Address2><![CDATA[]]></Address2>
      <Address3><![CDATA[London]]></Address3>
      <ShopOpeningDate><![CDATA[01/04/1990]]></ShopOpeningDate>
    </content>
  </entry>
  <entry>
  ...
  </entry>
</feed>

Running the Script

Placing my Script in an HTML page seemed like a logical choice, until I received the dreaded “Access-Control-Allow-Origin” error message. There are ways around this limitation, but for me, the simplest workaround was to place the script within a Bookmarklet. As I stated in a recent article, I recently discovered these nifty on-demand script links and have been finding a lot of practical uses for them ever since.

All that the bookmarklet does is load the real script, which is hosted on my personal website server:

javascript: (function () { 
    var jsCode = document.createElement('script');     
    jsCode.setAttribute('src', 'http://robgravelle.com/js_files/acme_addresses_filtering_bookmarklet.js'); 
    document.body.appendChild(jsCode);  
}());

Inside the acme_addresses_filtering_bookmarklet.js script, jQuery is loaded exactly the same way. The jQuery script’s onload() event is where I placed the mapping code:

var script = document.createElement("script");
script.type = "text/javascript";
script.onload = function() {
    console.log('jquery loaded');
    //...
};
script.src = "https://ajax.googleapis.com/ajax/libs/jquery/2.1.3/jquery.min.js";
document.getElementsByTagName("head")[0].appendChild(script);

Loading Unsecure Scripts from a Secure Page

Unfortunately, the cross-browser issue was not the last hurdle to overcome. There was also a problem loading scripts from a secure page – that is, one that is fetched over the HTTPS protocol. Instead of loading my script, the browser presented the following error:

Mixed Content: The page at 'https://feeds.acme.com/ShopFeed.aspx?FeedName=ShopDetails' was loaded over HTTPS, but requested an insecure script 'http://robgravelle.com/js_files/acme_addresses_filtering_bookmarklet.js'. This content should also be served over HTTPS.

Of course, security settings are configurable. In Chrome, I was able to circumvent this restriction by clicking on the shield beside the bookmark start at the far right of the address box:

load_unsafe_scripts_command.jpg

Processing Addresses

The jQuery.parseXML() will convert the document’s markup into an XML Document Object. It’s important to use this function because XML documents are not the same as HTML. For instance, XML includes special node types like <![CDATA[]]>. Taking a look at the document structure in the browser Elements inspector reveals that the XML content is contained within <PRE> tags:

<html>
  <head>
  </head>
  <body>
    <pre style="word-wrap: break-word; white-space: pre-wrap;">
      <?xml version="1.0" encoding="utf-8"?>
      <feed xml_lang="en-us" xml_base="http://feeds.pret.com/GoodFoodTalks/ShopFeed.aspx?FeedName=OpenShopDetails" >
      <title type="text">GoodFoodTalks_OpenShopDetails</title>
      ...
    </pre>
  </body>
</html>

Hence, we can select it using $('body > pre'). We then have to invoke the resulting object’s text() method to return the raw markup to parseXML(). Don’t use html() because it converts tags into browser-friendly text such as “&lt;body&gt;”.

The returned xml object must then be wrapped in a jQuery selector $() so that we can invoke the find() method to locate <ENTRY> tags. The resulting Object Collection supports the each() function so that we can iterate over each ENTRY section.

The each() function accepts a handler function that is called for each entry object. The content is passed via the this pointer. Again we can wrap it in the jQuery $() selector to fetch the Address fields.

xml = $.parseXML( $('body > pre').text() );

$(xml).find("entry").each(function() {
    var $this = $(this), 
        item  = {
            Address1: $this.find("Address1").text(),
            Address2: $this.find("Address2").text(),
            Address3: $this.find("Address3").text()
        },
        reNumeric   = /^\s*(\d+)((-|\/)\s*(\d+))?\s+/,
        houseNumber = item.Address1.match(reNumeric);    
});

The reNumeric and houseNumber variables filter out addresses that begin with a number, including 101, 77/78, and 19-20.

Conclusion

In today’s article, we learned what makes jQuery such a great choice for building dynamic, XML-based UI applications by utilizing jQuery’s DOM processing power to parse address fields from an Atom feed. In the next installment, we’ll convert these into a different format and display a side-by-side comparison in the browser.

Rob Gravelle
Rob Gravelle
Rob Gravelle resides in Ottawa, Canada, and has been an IT guru for over 20 years. In that time, Rob has built systems for intelligence-related organizations such as Canada Border Services and various commercial businesses. In his spare time, Rob has become an accomplished music artist with several CDs and digital releases to his credit.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Popular Articles

Featured