HTML Goodies: XHTML
Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js
[Good or Bad?]
Not too long ago, HTML version 4.0 was recommended by the World Wide Web Consortium (W3C). I put up a tutorial on all of the new HTML 4.0 commands, then set about creating in-depth tutorials for each command.
It was only a matter of time before someone wrote to me and asked when HTML 5.0 would be coming out. I have your answers:
- It never will.
- It already has.
Now, depending on which articles you read, (I've read waaay too many at this point), XHTML is either HTML 5.0, or HTML versions breathed their last with 4.0 and there will never be a 5.0 because XHTML is the direction markup languages are taking now.
Confused? Let's beat through it.
Right now there are two languages vying to be number one on the Web. The first is good old HTML and the second is Extensible Markup Language (XML). Which is better really depends on whom you talk to and what they want to do with the pages they create.
HTML is well within the grasp of the Weekend Silicon Warrior and creates decent text and image pages. It is, by far, the most-used language on the Web.
XML is much more dynamic and allows for much more specific database interaction than was ever possible before. An example would be searching for "dog" in Yahoo!. You get everything that has "dog", as well as all related, larger words such as "dogma". Well, XML can change all that. Your searches and requests can be specific. Results will be specific.
Another big plus on the XML side is the ability for you to create custom XML tags. If you want a tag named "zork" that allows you to turn text green and change the font size to 24 point, you can create it. Follow the links above to Goodies tutorials explaining how.
What Is This XHTML?
Once again, some say it's HTML with XML qualities. Others, like me, say it's XML with HTML written into the Document Type Definition (DTD).
Here's the scoop as I understand it. XML has become the chosen language for the Web's future. At least, that's the feeling I get from reading the pages on the W3C Web site. Obviously, you cannot simply eliminate HTML, so they did what, I think, was a pretty smart thing. They combined them. I just don't know that I'm overly thrilled with the way they combined them.
Document Type Definition: DTD
Inside your browser, there's a DTD. It's different from browser to browser depending on which version you're using. The reason that Internet Explorer 4.0 understands some HTML 4.0 level commands and Internet Explorer 3.0 doesn't is because those commands were written into the 4.0 browser's DTD.
[All modern browsers in common use fully support XHTML and HTML 4.01.]
The new XHTML 1.0 DTD (which looks like this, in case you're interested) is basically the XML DTD with the HTML 4.0 DTD put inside it. Users must follow the majority of XML rules because HTML is under XML's umbrella rather than being the other way around.
The W3C suggests that HTML should be "an application of XML". The purpose is to tighten HTML's programming standards to make them compliant with XML.
You may not like that, but there's some sense to it. XML is very specific. One thing means one thing. Period. HTML isn't so specific. For example:
- Tags can be in caps or not.
- TEXTAREA boxes require end tags, yet text boxes do not.
- Tags can end in any order regardless of how they were placed.
I'm sure you can come up with some more examples, but these are the three that I point out to students.
At the moment, the best one can hope to do is to write XHTML documents that are compatible with current browsers. I'll run down a few of the rules for writing in XHTML. If you've already read my XML tutorial, many will be familiar to you.
- You will use the XML & XHTML declaration statements to start every XHTML page:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd">
The commands will alert the browser displaying the page that XHTML is the language to render.
- The head and body tags are now mandatory.
- Every tag must be closed.
In HTML, you could get away with simply putting a <P> between paragraphs and the browser would render it just fine. If you only had one table on a page, you didn't need the end TD and end TR tags. Under the XHTML DTD, that's no longer true. All tags that require end tags get end tags.
- Empty tags get a terminating slash.
An empty tag is a tag that doesn't require an end tag. Examples include <BR> and <HR>.
Under the XHTML DTD, empty tags will now carry a space following the tag text and then a terminating slash, like so:
- <BR> is now <br />.
- <HR> is now <hr />.<
- <IMG SRC="--"> is now <img src="--" /> .
You may have noticed above that I wrote head, body, br, hr, and img in lower case in the XHTML examples. That's because:
- All tags must be lower case.
This does not apply to attributes, only tags. For example, both of these formats are acceptable under the XHTML DTD:
- <font color="#ffffcc">
- <font color="#FFFFCC">
You may have noticed that I have quotes around all of the attributes. That's because:
- Attribute quotes are now mandatory.
- Tags may not nest.
In HTML, this is an acceptable format. It will render:
No more. Now the tags must follow a logical begin and end pattern. They must end at the same level as they are started. This is the proper XHTML method of writing the code above:
Once again, note the lower case tags.
- Attribute values must be denoted.
Most attributes are done this way. For example, FONT FACE="arial". Notice that "arial" follows the attribute "FACE=".
The attribute and equal signs, in some cases, have been eliminated in HTML. For example:
- <INPUT TYPE="radio" checked>
The word "checked" is a minimized attribute. Under XHTML, no more. You must denote every attribute. Here's the correct method of writing what is above under the XHTML DTD:
- <input type="radio" checked="checked">
These don't come up too often. Here are a few examples in HTML format:
- <INPUT TYPE="radio" checked>
- <INPUT TYPE="checkbox" checked>
- <OPTION selected>
- <DL compact>
- <UL compact>
In each case, you'll need to set the minimized attribute to one that is denoted. The easy way to remember it is that it always denotes itself: checked="checked" and selected="selected".
- The <pre> tag cannot contain: img, object, big, small, sub, or sup.
- You may not have any forms inside of other forms.
- If your code contains a &, it must be written as &.
- Any use of CSS should use all lower case lettering.
I think you'll agree, my statement above, although not totally true, will save you multiple headaches.
- <!--Comments are no longer used.-->
If you want to write a comment in an XHTML document, you write it as:
- <[CDATA[comment goes in here]]>
That will throw big errors in some browsers.
XHTML: Good or Bad?
I guess, once again, that depends on whom you talk to. The W3C says the two main selling points are "extensibility and portability". I'll add "standardization" to that.
- Extensibility: XHTML is extendable. You can create your own tags and add onto it.
- Portability: Those new tags are done in such a way that all can understand it. (See the tutorial).
- Standardization: Now we have a true template for what is and is not acceptable coding. Everyone must follow that template.
On the other side of the coin, I see a couple of problems.
- XHTML is not as easy to just play around with on the weekend as HTML. HTML is a sort of computer tinker toy that everyone can use. People might lose interest being held to such a rigid set of rules. But you can still write your HTML document just as you always did. The XHTML DTD contains HTML. You just need to use an HTML declaration statement at the top of the document.
- XHTML, and XML for that matter, go directly against the rules the W3C laid down for web content and authoring tools accessible to disabled users.
I guess it's that first concern that worries me the most. I just don't want HTML to become a ghost hiding behind a language that the average Joe can't pick up. I wouldn't want those who run the coding show to "take back" programming on the Web by making it too difficult. I just hate the thought of that.
I took this example straight from the W3C's XHTML 1.0 page:
<?xml version="1.0" encoding="UTF-8"?>
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<p>Moved to <a href="http://vlib.org/">vlib.org</a>.</p>
As you can see, the coding is very strict and maybe a little complicated for the HTML enthusiast. The language is available right now if you want to try it out. I would suggest using an Internet Explorer 5.0 level browser. Follow the rules laid out in this tutorial and the XML custom tags tutorial.
Write slowly and deliberately. HTML was forgiving. XHTML is not. You may want to read a bit more and look at some XHTML source codes by visiting some Yahoo! XHTML Pages. Many of the pages offering help are actually written in XHTML.
Once you're finished writing, use the W3C Validator Service to check your work.
I don't see HTML ever going the way of the dinosaur, but I do expect to see it becoming more and more rigid under the XML umbrella. It's only a matter of time before all major search engines and server systems choose XML for database programming. HTML will need to start working under their choices.
Thanks to XHTML, you'll be able to continue writing in the HTML you've come to know and love. You just may need to clean it up a bit.
My guess is that XHTML 2.0 will specifically clean up HTML tags and their usage.
[Good or Bad?]