XHTML - An Introduction
Full Text Search: The Key to Better Natural Language Queries for NoSQL in Node.js
Date: 1/31/2018 @ 2 p.m. ET
Perhaps you've heard of XHTML
& wondered what it's all about.
You probably know what a markup language is: you have a bunch of text you need a way to say that this piece should be bold, this piece should be italic, this should be a heading, and so on. A markup language is a computer language (a set of codes) that allows you to specify which pieces should be which way.
Then along came HTML and the birth of what has become the World Wide Web. HTML is a standardized markup language that allows documents all over the world to be defined using the same set of rules, or codes, such that they can all be read using the same tool (a "Web Browser" such as Internet Explorer, Netscape or Opera.)
Everything in the World Wide Web was good. The Internet and its Web had been born, flourished and grown into by far the largest repository of knowledge and information in the history of the world.
Only trouble was -- folks realized there may be a better way (thank goodness for creativity!)
In the late nineties a specification was created for a new markup language known as XHTML. The specification, written in XML ( the eXtensible Markup Language) is maintained by the World Wide Web Consortium (W3C - www.w3c.org) and to put it in their words:
"The Extensible HyperText Markup Language (XHTML) is a family of current and future document types and modules that reproduce, subset, and extend HTML, reformulated in XML. XHTML family document types are all XML-based, and ultimately are designed to work in conjunction with XML-based user agents. XHTML is the successor of HTML, and a series of specifications has been developed for XHTML."
Now that must have cleared it up for you! Just in case it didn't quite make it, let me expound.
(By the way; I've read the following paragraph a few times. It's horribly complicated, but if you read it a few times it starts to become clearer. I just wish there was a way to say exactly the same thing without sounding so technical. Trust me though; if you read it a few times, it will become clear! The information it contains is vital.)
XML (the eXtensible Markup Language) is a powerful and rigorous set of specifications that is a meta-language, in that it is a language for defining a markup language. To explain: HTML is a markup language (the HyperText Markup Language) specified in SGML. SGML (the Standard Generalized Markup Language) is the international standard meta-language for markup languages. SGML is a huge and complicated set of rules (specifications) for defining the elements of a Document Type Definition (DTD). By defining the "elements" of a DTD we create a language for marking up a document of the type defined by the DTD. Put another way, by creating a DTD for an HTML document, we define the language "HMTL". The "elements" we defined are all the tags we are familiar with using. For example, something, somewhere, has to define <b> </b> as meaning the beginning and end of something we wish to have displayed (or printed) as Bold. The DTD contains these definitions.
The DTD for HTML is based on SGML. XML is a simplified form, or subset of SGML. XHTML is based on XML.
That doesn't mean that XHTML is just a simplified form of HTML! In fact, though not more complicated than HTML, it does have more hard and fast syntax rules, but at the same time it allows for a lot more flexibility. The key is in the X -- eXtensible.
You'll notice in the fancy W3C wording above that they talk about "a family of current and future document types and modules". "Future types and modules" includes one of the far reaching goals of XHTML, namely to create web pages that are "understood" by computers as well as people.
"Huh? They're already understood by computers" you're probably thinking. Please allow me to distinguish between "understood" and "interpreted".
A computer can interpret a web page inasmuch as it can read the markups and display it accordingly. It does not, however, understand the page the same way you do. The better search engines concern themselves with attempts to interpret the actual meaning of web pages, but imagine what could be done if the real meaning of those pages was contained in their code. Imagine the value of being able to tell your computer to visit every car dealership within fifty miles (or a hundred kilometers -- we have such a long way to go!) of your home and find the lowest price for a particular model of car with a particular set of options. This sort of thing is one of the long term goals of XHTML.
Another problem is that when a web page is created, the creator really has no idea on what type of device it is going to be displayed. Until recently, we have been able to languish in the probability that a standard web browser on a PC with a resolution of 800x600 or better would be the rule. In today's world, cell phones, cars, even refrigerators have browsers built right in. Add to that Braille and speech synthesis devices designed for those who don't see or hear things the same way as the rest of us and you have no real way of knowing how your web page is going to be interpreted. Wouldn't it be nice if you could still know how it would be understood?
If you design your web pages using the XHTML standard you have the best chance of knowing that your pages will be understood by any device or any available interpretive (or understanding) "browser program".
By the way, most WYSIWYG page generation programs produce moderate to acceptable HTML code. If you want to create good XHTML, you're going to have to learn how. Watch this site!