Friday, March 29, 2024

HTML or XHTML? Fact From Fiction

04/03/2007

 

On an almost daily basis I see the same question asked: Should I be using HTML or XHTML? On an almost equally frequent basis I also see people imply that one (usually XHTML) should always be used in favour of the other. Such suggestions, much like most of the reasons given for them, are complete nonsense.

There’s absolutely no reason why we can’t use one or the other, as we see fit, on a case-by-case basis. So to all those out there that are frantically trying to convert their entire life’s work from HTML to XHTML because HTML has been replaced by XHTML, and to those that are refusing to use XHTML because browsers don’t properly support it yet, stop, put the IDE down and step away from the myths.

It seems as though most of the pro-XHTML myths come from developers that are keen to be seen to be keeping pace with (what one could be forgiven for mistaking as the rapid) progression of web development. A perfectly sensible thing to do; if you stop learning you’ll just get left behind. It’s just that few people seem to be willing to admit that they’re experimenting and trying new stuff out, and so they just seem to parrot or make up weird and wild reasons for using XHTML. On the other hand most of the pro-HTML rhetoric seems to come largely from fuddy-duddy stick-in-the-mud types, or people that have taken the time to learn about XML and XHTML, but somehow think they might be keeping some kind of competitive or intellectual edge by discouraging others from doing the same (yes there are people that petty).

Given the prevalence of pro-XHTML mythology in today’s world of web development we’ll start out by taking a look at some of the pro-XHTML myths. Some might sound quite feasible, but the majority are just so absurd that I can’t help but wonder if some of the people saying these things might be a few pennies short of a full quid. See if any of them sound familiar.

It’s cleaner and more precise.
This one’s a bit of a weird one. I’m guessing it’s probably just a misinterpretation of the well-formedness constraints inherited from XML, but essentially it’s just nonsense.
It’s more standards-compliant.
An absolutely bonkers assertion. XHTML follows the XHTML Recommendations, and HTML follows the HTML Recommendations. They each follow their own rock-solid standards.
It’s more accessible.
Certainly not true. Try to use XHTML for anything other than just pointlessly serving it as broken HTML and you could quite easily hit some pretty serious accessibility issues.
It has replaced HTML.
HTML is going to be around for a long time to come yet. The fact that the W3C has reopened the W3C HTML Working Group, it could be argued, reflects the failure of XHTML to be adopted as anything more than a buzz-word with syntax errors and that we are far from ready to break away from HTML. At the rate most web browsers are progressing, HTML and XHTML served as broken HTML are probably going to continue to be the norm right up until the earth is destroyed to make way for a new hyperspace bypass.
HTML is not well-formed.
Indeed HTML doesn’t meet the well-formed constraints as defined in the XML Recommendation, but then it doesn’t need to. HTML is not an application of XML; it’s an application of SGML. HTML is perfectly capable of defining an unambiguous document structure (a key purpose of the well-formedness constraints expressed in the XML Recommendation). HTML allows you to omit more, but it doesn’t require that you omit as much markup as possible. My personal preference is to explicitly write in as much of the optional markup as my feeble little memory will allow, but in the end, it all comes down to you — the author.
It’s better.
A true classic. No reasoning, no justification, just — It’s better. You might as well be asking a three-year-old why Pikachu is better than Bulbasaur.

The list really does go on and on. The internet is awash with nonsense about why we shouldn’t use HTML any more.

The reasons I’ve heard for preferring HTML, on the other hand, are far fewer in number, but they do tend to carry a little more plausibility. Though that’s not to say we should blindly accept them any more readily than we should accept the pro-XHTML nonsense. The key to making the right choice is understanding a few basic concepts and using this understanding to weigh the beneficial and problematic effects your choice might have on whoever or whatever is accessing your document.

The first point to recognize is that XHTML and XML have some very useful features and that they can be used to great effect, but the thing is if you’re not using these features then there’s probably not really much point in using XHTML. There’s also the consideration that if you do serve up XHTML as HTML (Content-Type: text/html), which is the case in 99% of the XHTML I’ve seen on the web, then that’s exactly how the browser will treat it: as just plain old HTML with bucketloads of syntax errors. So in such cases, it may well have been better to just make the document in plain old HTML without all the perceived syntax errors.

I know of a few developers that like to use other markup languages defined in XML and combine them with XHTML to create documents that simply would not be possible with HTML alone. Be that combining XHTML with markup languages they’ve made themselves, or by using the pre-packaged DOCTYPEs from the W3C that bundle together XHTML, SVG and MathML so that you can jump straight in and get busy trying to make your latest and greatest. The ability to combine XHTML with new markup languages made from the same parent language is indeed a great step foward. With any luck we’ll see big improvements in support for XML in more browsers so that this sort of thing can really take off and allow greater innovation in web development.

It’s rather obvious, but XML processors are only required to process XML. An XML processor is no more obliged to accept text/html as valid input — as it is tinky/winky. Some implementations may well accept text/html, but I certainly wouldn’t recommend relying on such behaviour. So if you’re planning for a document to be usable by both people and fully- or semi-automated XML processors — which can include a wide range of applications from Javascript implementations retrieving documents and including the content into the existing structure of an already visually rendered document (or Ajax as some people insist on calling it) to clients hoovering up all the data they can find to chuck in a database — then I’d strongly consider using XHTML and serving it as such (application/xhtml+xml is recommended by the W3C, but most cases may well require the use of application/xml or text/xml). It all depends on what you deem to be the most important modes of retrieval and use of the document you’re making.

Accessibility, I think, is actually XHTML’s biggest problem. As I’ve just said, if you do decide to use XHTML for something other than serving it as broken HTML then you’re probably going to have to carefully think about what, who and possibly why you want to be able to access the document. Aside from (what I hope is) the very obvious side of accessibility that I’d like to think we all consider by default: access for the disabled and disadvantaged, XHTML can also throw up major accessibility issues for many more users in general. Here’s three of the most prevalent problems for you to consider when attempting to use XHTML.

  • It might seem like a strange thing to list as a problem, but the MIME Content-Types associated with XHTML can make for some significant accessibility hurdles. They shouldn’t, but they do. A prime example is following the recommendations set out in section 3 of the W3C note XHTML Media types, and serving XHTML documents as application/xhtml+xml. The note says you should serve XHTML as application/xhtml+xml, but if you do, you’re going to make your XHTML document totally inaccessible to anyone using Microsoft Internet Explorer (and that’s a lot of people!). If you’re trying to provide content to a less common remote XML processor and you don’t have much information about it, then you might also have to either play a bit of a guessing game in trying to figure out which content-type to serve with, or have to go into content negotiation based on the http accept header (remembering that plenty of user agents state */* in the header when they shouldn’t). So when you’re going for XHTML, remember that you can quite rightly serve it as any one of three types, but there are applications out there that might recognize only one of them. Finding a commonly accepted content-type among all user agents you intend to receive the document can save you having to go into content negotiation, though.
  • There’s quite a few browsers that choose to ignore the XML 1.0 Recommendation and don’t bother recognizing and replacing external entity references (at least not in any useful way), and so deliver a rather crippling blow to the X in XHTML. So — if you’re planning on mixing XHTML with another markup language that’s not already built into the browser, you’re probably going to have to come up with some rather impractical work-arounds or just give up on the idea of releasing it for general web access. Mozilla Firefox is probably the web’s most popular culprit of this selective ignorance of external entities, but it’s not the only one.
  • Using XHTML 1.1 is another great way to make your documents inoperable in a large number of validating XML processors. Trying to find a validating processor that is willing to resolve external entities and find the actual XHTML 1.1 DTD (rather than a modified internal version) to be valid seems to be an impossible task. As far as I can see, the XHTML 1.1 DTD seems perfectly valid; it just feels like every validating processor on the planet just happens to have one problem or another that makes it unable to retrieve the DTD in its entirety and make use of it. So whether you’re using XHTML 1.1 on its own or mixing it with other markup languages, make sure that your intended audience can use it and that the receiving XML processor will not be brought to a screeching halt by whatever parts of the XHTML 1.1 DTD you’ll be using.

Making XHTML that wouldn’t be better off being HTML in the first place and works for all intended recipients is not usually an easy task, but hopefully you’ll now be a little more aware of the potential pitfalls of using XHTML and wary of the massive amounts of misinformation about which to use. So to sum up, unless you’ve got a real reason to be using XHTML, there’s probably not much point. Just make sure that if you are using it, that you’re careful about what you’re doing and that you do plenty of testing. I would strongly suggest using XHTML to learn about XML (a perfectly valid reason for using XHTML in my opinion). XHTML provides many pre-made working examples of various aspects of XML for you to learn from, tweak and experiment with, but don’t feel as though you have to go rushing into anything. HTML is going to around for a very long time to come.

Stephen Philbin is a freelance web developer and writer that would live at http://www.stephenphilbin.com/ if only he could find the time to build a site for himself.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Popular Articles

Featured