Handling Page Not Found Errors When Using Offline Caching

By Rob Gravelle

Back in my The AppCache in Action article, I outlined the role of the Manifest file in designating resources for offline viewing. One section, called "FALLBACK", assigns alternate resources to display if requested ones are unavailable. Experience with the Manifest file bears out that it works very well, in fact, a little too well. This becomes especially problematic when a generic resource pattern is used such as a slash (/). Since that matches any resource on the site, even typing in bogus URL paths will cause the fallback resource to appear, instead of the 404/500 HTTP errors that should happen when online. In today's article, I'd like to present one way of dealing with this issue so that the fallback resource does not come up when the user is in fact online.

The Problem Defined

The source of many web developers' frustration is the ability of the Manifest file's FALLBACK section to act as a catch-all for a resource type, directory, or even an entire site. Consider this first example that defines fallback.html as the offline resource for all HTML files:

FALLBACK:
/*html /fallback.html

In this second example, any resources in the sales directory that were not available online would default to the sales_offline.html page:

FALLBACK:
/sales/ /sales_offline.html

The worst offender is the site-wide catch-all, defined by the forward slash (/). It causes any unavailable resource to fallback to the assigned alternative:

FALLBACK:
/ /offline_viewer.html

So far everything sounds peachy, so why all the fuss? The problem is that even resources that don't exist are redirected. That means that typos in the URL also cause the browser to switch to offline mode. A lot of the time, that isn't what you want. Rather, a 404 page or online default page should appear. Unfortunately, the Manifest can't - or won't - distinguish between 4xx, 5xx codes and a genuine offline condition. Believe it or not, that is by design. According the the whatwg.org site:

If the fetching of the resource results in a redirect to a resource with another origin (indicative of a captive portal), or a 4xx or 5xx status code or equivalent, or if there were network errors (but not if the user canceled the download), then instead get, from the cache, the resource of the fallback entry corresponding to the fallback namespace.

Does that mean that we're stuck with this behavior? Not necessarily…

Using the Manifest to Our Advantage

A lot of people have put forth workarounds to this issue; some of them are simple, others, quite complex. A nifty solution that I came across is not only easy, but not too high on the "hackiness" scale. Best of all, it makes use of the Manifest to determine whether or not we are online. You have to get some satisfaction from that!

I've seen a lot of solutions that use Ajax to ping the server for a known resource. A far easier way is to include a couple of JavaScript files in your Manifest file, under the FALLBACK section:

FALLBACK:
/scripts/online.js /scripts/offline.js
/ /offline_viewer.html

The online.js file contains the following one line of code, while offline.js can either be blank or set the global online variable to false:

window.online = navigator.onLine;

We can test the value of this global variable in the offline_viewer.html fallback page to confirm whether or not we are really offline. This inline function accesses the value defined in the online.js file in order to determine whether or not to redirect to our 404.html page. Remember that the online.js file will be accessible ONLY if we are in fact connected to the Internet:

(function redirect404Errors() {
	if (window.online) {
		window.location.replace("404.html?url="+encodeURIComponent(document.location.href));	
	}	
})(); 

Since we only bring up the 404.html page if we are indeed connected, then we should include it under the NETWORK section of the manifest file so that it knows to fetch it from the server and never cache it:

# Resources that require the user to be online.
NETWORK:
404.html

There are a lot of options as far as displaying the 404 page goes. I chose to simulate my Jetty server's. Most sites have their own 404 page that matches the look & feel of their site:

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
<title>Error 404 NOT_FOUND</title>
</head>
<body>
<h2>HTTP ERROR: 404</h2>

<p>Problem accessing <span id="resource"></span>. Reason:
<pre>    NOT_FOUND</pre></p>
<hr /><i><small>Powered by Jetty://</small></i>
                             
</body>
</html>

It's usually a good idea to show the user what resource they were trying to bring up. Often times, seeing the URL in a page is enough to make them realize that they mistyped it. I passed the resource path in the query search string, but a POST transmission would be even better. On the server you could retrieve the URL from the POST data using your favorite server language and pass it along to the 404.html page. I chose to stick to JavaScript for the sake of simplicity.

Our script populates placeholder on the page with the URL. The trickiest thing about this operation is decoding the query string, although, even that isn't especially difficult, thanks to the native JS decodeURIComponent() method:

window.onload = function() {
  var url = getUrlVars()['url'] || 'requested resource';
  
  document.getElementById('resource').innerHTML = url;
}

function getUrlVars()
{
    var vars = [], 
        keyValuePair,
        keyValuePairs = decodeURIComponent(window.location.search).substr(1).split('&');
    
    for(var i=0; i<keyValuePairs.length; i++) {
        keyValuePair = keyValuePairs[i].split('=');
        vars[keyValuePair[0]] = keyValuePair[1];
    }
    
    return vars;
}

Conclusion

Catch-all FALLBACK expressions are a sure-fire way to make sure that all site resources are diverted to the offline alternative should connectivity break down for whatever reason because you can never know which pages a visitor will bookmark. Unfortunately, relying on such a general trigger caused legitimate "page Not Found" errors to also be interpreted as an offline situation. At least, it can without a suitable workaround. Although connectivity is never a black or white condition, using this technique can provide a higher level of confidence in differentiating between misspelled URLs and connectivity issues.


If you enjoyed this article, please contribute to Rob's rock star aspirations by purchasing one of Rob's cover or original songs from iTunes.com for only 0.99 cents each.

Rob Gravelle resides in Ottawa, Canada, and is the founder of GravelleWebDesign.com. Rob has built systems for Intelligence-related organizations such as Canada Border Services, CSIS as well as for numerous commercial businesses. Email Rob to receive a free estimate on your software project.

In his spare time, Rob has become an accomplished guitar player, and has released several CDs. His band, Ivory Knight, was rated as one Canada's top hard rock and metal groups by Brave Words magazine (issue #92).

Rob uses and recommends MochaHost, which provides Web Hosting at $1.95 per month, 2 LifeTime Free Domains, and 6 Months Free!



Make a Comment

Loading Comments...

  • Web Development Newsletter Signup

    Invalid email
    You have successfuly registered to our newsletter.
  •  
  •  
  •