Friday, March 29, 2024

Mapping XML Feed Fields with jQuery

The jQuery library’s parseXML() function applies its well-known DOM parsing, traversing, and manipulating abilities to XML documents – and, by extension, to XML feeds. In the Parsing XML Feeds with jQuery article, we were introduced to a script that loaded an Atom feed from the current location and extracted each entry’s address fields. In today’s follow-up, we’ll map the address fields into a new format and display a side-by-side comparison of the original and transformed fields.

Categorizing Address Types

As described in the last article, the input data was retrieved from an Atom feed that contained information about restaurant locations. Address fields named Address1, Address2, and Address3 were to be mapped to a “Building name/number”, “Street”, and “City” fields. This is not a trivial exercise due to the variety of address formats:

Address1 Address2 Address3
Numeric Type
104 – 111 Market Street St Andrews
155 Old Broad Street London
156 Bethnal Green Road London
70-77 Deansgate Manchester
1/8 Queensgate Shopping Centre Peterborough (empty)
Unit Type
Unit B8.S.23, 7 West Ferry West Ferry Circuit Canary Wharf
Unit 49 Broadway Shopping Centre Hammersmith, London
Other Types
Fleetplate House 10-19 Holborn Viaduct London
Heathrow Airport Terminal 3 Airside Airside Lounge South West Node Hounslow

Some of the issues are that…

  • Address1 often contains both the Building name/number and Street.
  • House numbers and building names may include dashes, slashes, dots, spaces, and other characters.
  • Address2 is often blank, or contains a combination of Street, neighborhood, and/or city information.
  • Address3 usually contains the City, but might also include other information, such as postal code, neighborhood, and other superfluous info!

I found that grouping addresses into those that begin with a number and those that begin with the word “Unit” covered most of the data. Moreover, the remaining records could be mapped more-or-less directly into the target fields.

Each address type may be represented by an object that includes a description and an array of address items. Source address items contain the Address1, Address2, and Address3 fields, while mapped addresses contain the BuildingNameNumber, Street, and TownCityState fields.

The following code snippet contains the addressTypes and mappedAddresses object definitions as well as the code that stores each address based on its type:

var addressTypes = {
  addressType1: {
      descr: 'Number and Street',
      items: []
  },
  addressTypeUnit: {
      descr: 'Unit Type',
      items: []
  },
  addressTypeOther: {
      descr: 'Other address types',
      items: []
  }
};
mappedAddresses = {
  addressType1: {
      items: []
  },
  addressTypeUnit: {
      items: []
  }
};

xml = $.parseXML( $('body > pre').text() );

$(xml).find("entry").each(function() {
    var $this = $(this), 
        item  = {
            Address1: $this.find("Address1").text(),
            Address2: $this.find("Address2").text(),
            Address3: $this.find("Address3").text()
        },
        reNumeric   = /^\s*(\d+)((-|\/)\s*(\d+))?\s+/,
        houseNumber = item.Address1.match(reNumeric),
        reUnit      = /^\s*Unit\s+/,
        unitNumber  = item.Address1.match(reUnit);  
    
    if ( houseNumber ) {
        addressTypes.addressType1.items.push(item);
    }
    else if ( unitNumber ) {
        addressTypes.addressTypeUnit.items.push(item);
    }
    else {
        addressTypes.addressTypeOther.items.push(item);
    }
});

Mapping Addresses that Begin with a House Number

Address1 fields that contain house number are best broken down into individual words, beginning with the house number. These are stored in the parts variable below. Its declaration may seem complicated, but the code is merely declaring an array literal that contains a single element. The reNumeric RegEx pattern not only identifies addresses that begin with a number but also stores individual address components including the first number, a dash or forward slash character, and the following number. The presence of element 2 tells us that there is a three-part address, e.g. 10 – 1 or 99/1, so that the two extra fields may be appended to the house number.

Sometimes there is duplication of the neighbourhood and/or country information between the Address2 and Address3 fields, so there is is a check that removes it from the parts array if need be.

if ( houseNumber ) {
    addressTypes.addressType1.items.push(item);
    
    var parts = [ houseNumber[1] + (houseNumber[2] ? houseNumber[3] + houseNumber[4] : '') ];
              
    //parse the rest of Address1 and concatenate it to the parts array
    parts = parts.concat( item.Address1.substr( houseNumber[0].length ).split(/\s+/) );
    
    //remove redundant neighbourhood/country info from Address1 
    if ( [item.Address2, item.Address3].indexOf(parts[parts.length - 1]) > -1 ) {
      parts.pop();
    }
    
    //add the mapped addresses to the array
    mappedAddresses.addressType1.items.push({
       BuildingNameNumber: parts[0],
       Street:             parts.slice(1).join(' '),
       TownCityState:      (item.Address2.length > 0 && item.Address2 != item.Address3 
                            ? item.Address2 + (item.Address3.length > 0 
                                               ? ', ' 
                                               : '') 
                            : '') + item.Address3        
    });
}

Displaying the Source and Target Address Fields in Side-by-side Tables

The end result of this exercise is of course to produce side-by-side tables containing the original and mapped address fields. These will be placed at the top of the body using the jQuery prepend() function so that they will appear at the top of the page. A for…in loop iterates over each address type in the addressTypes object, while a simple for loop iterates over the items array to produce the table rows. A mappedAddrTable is generated for every address type except the “addressTypeOther”, which are not mapped. A style of “float: left” and “float: right” places the two tables and the left and right-hand side of the screen respectively. The generated tables are appended to the infoDiv along with a spacer DIV empty paragraph in order to add some vertical spacing.

var infoDiv = $('<div id="info"></div>');
$('body').prepend(infoDiv);
for (addressType in addressTypes) {
    if (addressTypes.hasOwnProperty(addressType)) {
       var itemsLen = addressTypes[addressType].items.length;
       if (itemsLen) {
         var origAddrTable = '<table style="float: left;" border="1" width="750"><thead><tr><th>Address1</th><th>Address2</th><th>Address3</th></tr>'
                           + '</thead><tbody><caption>' + addressTypes[addressType].descr + '</caption>';
         for (i=0; i<itemsLen; i++) {
             origAddrTable += '<tr>' 
                   + '<td>' + addressTypes[addressType].items[i].Address1.replace(/\n/, '') + '</td>' 
                   + '<td>' + addressTypes[addressType].items[i].Address2 + '</td>'
                   + '<td>' + addressTypes[addressType].items[i].Address3 + '</td>'
                   + '</tr>';
         }
         origAddrTable += '</tbody></table>';
  
         var mappedAddrTable = '';
         //addressTypeOther is not mapped       
         if (addressType != 'addressTypeOther') {
           mappedAddrTable = '<table style="float: right;" border="1" width="700"><thead><tr>'
                           + '<th>Building Name/Number</th><th>Street</th><th>Town/City/State</th></tr></thead>'
                           + '<tbody><caption>' + addressType + ' Mapped</caption>';
           itemsLen = mappedAddresses[addressType].items.length;
           for (i=0; i<itemsLen; i++) {
               mappedAddrTable += '<tr>' 
                     + '<td>' + mappedAddresses[addressType].items[i].BuildingNameNumber + '</td>' 
                     + '<td>' + mappedAddresses[addressType].items[i].Street + '</td>'
                     + '<td>' + mappedAddresses[addressType].items[i].TownCityState + '</td>'
                     + '</tr>';
           }
           mappedAddrTable += '</tbody></table>';
         }
         $(infoDiv).append(origAddrTable + mappedAddrTable + '<div style="clear: both;"></div><p></p>');
       }
    }
}

Here is a screen capture of the finished tables in the browser:

address

Conclusion

As we saw during the past two articles, combining jQuery’s parseXML() function with its well-known DOM manipulation capabilities provides an all-in-one solution for transforming XML data into a format that can interpreted by various audiences. In this particular example, the audience was the developer, but the same information could be just as easily formatted for managers, marketers, and casual visitors alike.

Rob Gravelle
Rob Gravelle
Rob Gravelle resides in Ottawa, Canada, and has been an IT guru for over 20 years. In that time, Rob has built systems for intelligence-related organizations such as Canada Border Services and various commercial businesses. In his spare time, Rob has become an accomplished music artist with several CDs and digital releases to his credit.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Popular Articles

Featured