HTML or XHTML? Fact From Fiction
By Stephen Philbin
April 3, 2007
On an almost daily basis I see the same question asked: Should I be using HTML or XHTML?
On
an almost equally frequent basis I also see people imply that one (usually XHTML) should always be used
in favour of the other. Such suggestions, much like most of the reasons given for them, are complete nonsense.
There's absolutely no reason why we can't use one or the other, as we see fit, on a case-by-case basis. So to all
those out there that are frantically trying to convert their entire life's work from HTML to XHTML because
HTML has been replaced by XHTML
, and to those that are refusing to use XHTML because browsers don't
properly support it yet
, stop, put the IDE down and
step away from the myths.
It seems as though most of the pro-XHTML myths come from developers that are keen to be seen to be keeping pace
with (what one could be forgiven for mistaking as the rapid) progression of web development. A perfectly
sensible thing to do; if you stop learning you'll just get left behind. It's just that few people seem to be
willing to admit that they're experimenting and trying new stuff out, and so they just seem to parrot or make up
weird and wild reasons for using XHTML. On the other hand most of the pro-HTML rhetoric seems to come largely from
fuddy-duddy stick-in-the-mud types, or people that have taken the time to learn about XML and XHTML, but somehow
think they might be keeping some kind of competitive or intellectual edge by discouraging others from doing the
same (yes there are people that petty).
Given the prevalence of pro-XHTML mythology in today's world of web development we'll start out by taking a look
at some of the pro-XHTML myths. Some might sound quite feasible, but the majority are just so absurd that I
can't help but wonder if some of the people saying these things might be a few pennies short of a full quid.
See if any of them sound familiar.
It's cleaner and more precise.
- This one's a bit of a weird one. I'm guessing it's probably just a misinterpretation of the
well-formedness
constraints inherited from XML, but essentially it's just nonsense.
It's more standards-compliant.
- An absolutely bonkers assertion. XHTML follows the XHTML Recommendations, and HTML follows the HTML
Recommendations. They each follow their own rock-solid standards.
It's more accessible.
- Certainly not true. Try to use XHTML for anything other than just pointlessly serving it as broken
HTML and you could quite easily hit some pretty serious accessibility issues.
It has replaced HTML.
- HTML is going to be around for a long time to come yet. The fact that
the W3C has reopened the W3C HTML Working Group, it
could be argued, reflects the failure of XHTML to be adopted as anything more than a buzz-word with syntax errors
and that we are far from ready to break away from HTML. At the rate most web browsers are progressing, HTML and
XHTML served as broken HTML are probably going to continue to be the norm right up until the earth is destroyed to
make way for a new hyperspace bypass.
HTML is not well-formed
.
- Indeed HTML doesn't meet the
well-formed
constraints as defined in the XML Recommendation, but then it
doesn't need to. HTML is not an application of XML; it's an application of SGML. HTML is perfectly capable of
defining an unambiguous document structure (a key purpose of the well-formedness
constraints expressed in
the XML Recommendation). HTML allows you to omit more, but it doesn't require that you omit as much
markup as possible. My personal preference is to explicitly write in as much of the optional markup as my feeble
little memory will allow, but in the end, it all comes down to you the author.
It's better.
- A true classic. No reasoning, no justification, just
It's better.
You might as well be asking
a three-year-old why Pikachu is better than Bulbasaur.
The list really does go on and on. The internet is awash with nonsense about why we shouldn't use HTML any
more.
The reasons I've heard for preferring HTML, on the other hand, are far fewer in number, but they do tend to
carry a little more plausibility. Though that's not to say we should blindly accept them any more readily than we
should accept the pro-XHTML nonsense. The key to making the right choice is understanding a few basic concepts
and using this understanding to weigh the beneficial and problematic effects your choice might have on whoever
or whatever is accessing your document.
The first point to recognize is that XHTML and XML have some very useful features and that they can be used to
great effect, but the thing is if you're not using these features then there's probably not really much point in
using XHTML. There's also the consideration that if you do serve up XHTML as HTML (Content-Type: text/html),
which is the case in 99% of the XHTML I've seen on the web, then that's exactly how the browser will treat it: as
just plain old HTML with bucketloads of syntax errors. So in such cases, it may well have been better to just make
the document in plain old HTML without all the perceived syntax errors.
I know of a few developers that like to use other markup languages defined in XML and combine them with
XHTML to create documents that simply would not be possible with HTML alone. Be that combining XHTML with markup
languages they've made themselves, or by using the pre-packaged
DOCTYPEs from the
W3C that bundle together XHTML, SVG and MathML so that
you can jump straight in and get busy trying to make your latest and greatest. The ability to combine XHTML with
new markup languages made from the same parent language is indeed a great step foward. With any luck we'll see
big improvements in support for XML in more browsers so that this sort of thing can really take off and allow
greater innovation in web development.
It's rather obvious, but XML processors are only required to process XML. An XML processor is no more
obliged to accept text/html as valid input as it is tinky/winky. Some implementations may well accept
text/html, but I certainly wouldn't recommend relying on such behaviour. So if you're planning for a document to
be usable by both people and fully- or semi-automated XML processors -- which can include a wide range of
applications from Javascript implementations retrieving documents and including the content into the existing
structure of an already visually rendered document (or Ajax
as some people insist on calling it) to clients
hoovering up all the data they can find to chuck in a database -- then I'd strongly consider using XHTML
and serving it as such (application/xhtml+xml is recommended by the W3C, but most cases may well
require the use of application/xml or text/xml). It all depends on what you deem to be the most important modes of
retrieval and use of the document you're making.
Accessibility, I think, is actually XHTML's biggest problem. As I've just said, if you do decide to use XHTML
for something other than serving it as broken HTML then you're probably going to have to carefully think about
what, who and possibly why you want to be able to access the document. Aside from (what I hope is) the very
obvious side of accessibility that I'd like to think we all consider by default: access for the disabled and
disadvantaged, XHTML can also throw up major accessibility issues for many more users in general. Here's three of
the most prevalent problems for you to consider when attempting to use XHTML.
- It might seem like a strange thing to list as a problem, but the MIME Content-Types associated with XHTML can
make for some significant accessibility hurdles. They shouldn't, but they do. A prime example is following the
recommendations set out in section 3 of the W3C
note
XHTML Media types
, and serving XHTML documents as
application/xhtml+xml. The note says you should serve XHTML as application/xhtml+xml, but if you
do, you're going to make your XHTML document totally inaccessible to anyone using Microsoft Internet
Explorer (and that's a lot of people!). If you're trying to provide content to a less common remote XML processor
and you don't have much information about it, then you might also have to either play a bit of a guessing game in
trying to figure out which content-type to serve with, or have to go into content negotiation based on the http
accept header (remembering that plenty of user agents state */*
in the header when they shouldn't). So
when you're going for XHTML, remember that you can quite rightly serve it as any one of three types, but there
are applications out there that might recognize only one of them. Finding a commonly accepted content-type
among all user agents you intend to receive the document can save you having to go into content negotiation,
though.
- There's quite a few browsers that choose to ignore the XML 1.0 Recommendation and don't bother recognizing
and replacing external entity references (at least not in any useful way), and so deliver a rather crippling blow
to the X in XHTML. So if you're planning on mixing XHTML with another markup language that's not already
built into the browser, you're probably going to have to come up with some rather impractical work-arounds or
just give up on the idea of releasing it for general web access. Mozilla Firefox is probably the web's most
popular culprit of this selective ignorance of external entities, but it's not the only one.
- Using XHTML 1.1 is another great way to make your documents inoperable in a large number of validating XML
processors. Trying to find a validating processor that is willing to resolve external entities
and find the actual XHTML 1.1
DTD (rather than a modified internal version) to be valid
seems to be an impossible task. As far as I can see, the XHTML 1.1 DTD seems perfectly valid; it just feels like
every validating processor on the planet just happens to have one problem or another that makes it unable to
retrieve the DTD in its entirety and make use of it. So whether you're using XHTML 1.1 on its own or mixing it
with other markup languages, make sure that your intended audience can use it and that the receiving XML
processor will not be brought to a screeching halt by whatever parts of the XHTML 1.1 DTD you'll be using.
Making XHTML that wouldn't be better off being HTML in the first place and works for all intended
recipients is not usually an easy task, but hopefully you'll now be a little more aware of the potential pitfalls
of using XHTML and wary of the massive amounts of misinformation about which to use. So to sum up, unless you've
got a real reason to be using XHTML, there's probably not much point. Just make sure that if you are using it,
that you're careful about what you're doing and that you do plenty of testing. I would strongly suggest using XHTML
to learn about XML (a perfectly valid reason for using XHTML in my opinion). XHTML provides many pre-made working
examples of various aspects of XML for you to learn from, tweak and experiment with, but don't feel as though you
have to go rushing into anything. HTML is going to around for a very long time to come.
Stephen Philbin is a freelance web developer and writer that would live at http://www.stephenphilbin.com/ if only he could find the time to build a site for himself.