Parsing Atom Feeds using XPath

By Rob Gravelle

Parsing Atom Feeds using XPath

Atom has become a very popular way to share information over the World Wide Web. The Atom Syndication format is built on the XML standard so that it may be displayed directly in modern browsers or parsed by scripts and software to fetch pertinent information in real time. Once converted into an XML document object, a script or program can either iterate over node elements of query the document for specific data. In this tutorial, we'll learn how to apply the xpath XML extraction language to Atom feeds using PHP in order to target specific content.

Atom Document Structure

Take a look at the sample feed below and you'll notice several things:

  • There is a feed tag that contains the base URL and an attribute called xmlns that points to the Atom spec. XMLNS is an acronym for "XML NameSpace". Keep that in mind because we'll need it a little later on...
  • The feed contains some meta info such as the title, subtitle, and author details.
  • A feed may contain one or more entries, which are each identified by an id and title tag.
  • Entries in turn may contain content.
<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xml:base="https://afictionalsite.com/Feed.aspx?FeedName=ProductPrices" xmlns="http://www.w3.org/2005/Atom">
  <title type="text">GoodFoodTalks_WebProductPriceByShop</title>
  <subtitle>GoodFoodTalks_WebProductPriceByShop description</subtitle>
    <name>A Fictional Site</name>
    <content type="application/xml">
      <Updated><![CDATA[2014-11-11 11:29:28]]></Updated>
    <content type="application/xml">
      <Updated><![CDATA[2014-11-11 11:29:28]]></Updated>
    <content type="application/xml">
      <Updated><![CDATA[2014-11-11 11:29:28]]></Updated>

XPath and PHP

Before you can query a feed, you have to fetch it using a utility like curl. Its results may then be loaded into a SimpleXMLElement object via its constructor. Be sure to check for a value of FALSE, because by setting the CURLOPT_RETURNTRANSFER option, curl_exec() returns the result on success or FALSE on failure. As explained in the Display Secure Atom Feeds in WordPress article, from which the sample code originates, passing the LIBXML_NOCDATA constant to the SimpleXMLElement constructor "tells the parser to extract values from <![CDATA[]]> tags."

Each entry is stored in the <SimpleXMLElement instance>->entry array.

$ch = curl_init();
curl_setopt($ch,CURLOPT_URL, 'https://afictionalsite.com/Feed.aspx?FeedName=TheGoodNewsFirst');

// Execute, grab errors
$result = curl_exec($ch);
if($result !== FALSE) {
  $pricesfeedXml = new SimpleXMLElement($result, LIBXML_NOCDATA);
  //fetch the products for each category
  if (isset($pricesfeedXml->entry) ) {
    foreach($feedXml->entry as $entry) {
      //proceess each item

// close cURL resource, and free up system resources

Fetching a Content Node by Attribute

For our first xpath query, we'll fetch a content node that contains a specific attribute value.

Before you can execute any queries against the XML document, you have to register the Atom namespace - that's the xmlns attribute in the opening feed tag. If you don't, you won't get any results. To do that, invoke the XML document instance's registerXpathNamespace() method. It accepts two arguments: the name that you want to give your namespace and the value of the xmlns attribute, which is the URL to the Atom spec.

Once you've done that, you have to include your namespace name at every level of your query paths. In this case, we would use the following xpath query to fetch a content node with a <shop_no> value of 10:


Our "atom" namespace must be prefixed to every document element, giving us this string:


Note that the text() function doesn't require a prefix because it's a function and not a document element.

With that in mind, here is the code that execute the query by invoking the XML document instance's xpath() method. The returned object is an array of SimpleXMLElements on success or a boolean value of FALSE on failure:

//the atom namespace MUST be registered before using xpath
$pricesfeedXml->registerXpathNamespace('atom' , 'http://www.w3.org/2005/Atom');

$shop_no = get_post_meta( $restaurant_id, 'shop_number', true );
$xpath = "//atom:entry/atom:content[atom:shop_no[text()='" . $shop_no . "']]";
$prices = $pricesfeedXml->xpath($xpath);
if ( $prices === false ) {
  echo 'No prices found for shop no ' . $shop_no . "\n";
else {
  //do something with the price

Fetching an Entry Node by Attribute

If you require one or more of the <entry> node's attributes, you have a couple of options available. First, you can perform the same lookup as above with the extra ::parent() call such as follows:

$xpath = "//atom:entry/atom:content/atom:shop_no[text()="' . $shop_no . '"]/parent::*'";
$entry_node = $pricesfeedXml->xpath($xpath);

Another way to accomplish the same thing would be to simply move the content level into the square brackets, where the attribute is located:

$xpath = "//atom:entry[/atom:content[./atom:shop_no[text()='" . $shop_no . "']]";
$entry_node = $pricesfeedXml->xpath($xpath);

Homing in On a Specific Value

Going the other way, you can fetch a single property based on a sibling using something like the following query:

$xpath = "//atom:entry/atom:content/atom:TakeAwayPrice[../atom:shop_no[text()='" . $shop_no . "']]";
$take_away_price_node = $pricesfeedXml->xpath($xpath);

Since the xpath() function always returns an array, you have to access the value as follows:

$take_away_price = $take_away_price_node[0]->textNode;


Using xpath to query XML documents for specific data sure beats iterating over every node element. Depending on the language that you are working in, the exact mechanism and syntax for using xpath may differ, but the general principles outlined here should hold up well.

Rob Gravelle

Rob Gravelle resides in Ottawa, Canada, and is the founder of GravelleWebDesign.com. Rob has built systems for Intelligence-related organizations such as Canada Border Services, CSIS as well as for numerous commercial businesses.

In his spare time, Rob has become an accomplished guitar player, and has released several CDs. His band, Ivory Knight, was rated as one Canada's top hard rock and metal groups by Brave Words magazine (issue #92) and reached the #1 spot in the National Heavy Metal charts on Reverb Nation.

  • Web Development Newsletter Signup

    Invalid email
    You have successfuly registered to our newsletter.
Thanks for your registration, follow us on our social networks to keep up-to-date