What is XML?

By Joe Burns

Use these to jump around or read it all...

[Okay, What Is It?]
[DTD (Document Type Definition)]
[Making Your Own Blocks]
[So, What's Wrong With HTML?]
[Two Kinds of XML Pages]
[Creating the DTD]
[Is There More?]
[So, Now What?]

     The buzz-word "XML" is beginning to pop up all over the Net, and in the Goodies e-mail box. People are wondering what this new language is and how it's going to effect the way people write. To be honest, I was wondering the same thing until I started looking into it.
     Avid readers of HTML Goodies already know that XML is mentioned in two other tutorials (as of 5/11/98): HTML 4.0 and the Active Channel Tutorial. The language is starting to make a few in-roads into the Web and that makes a few people nervous. As one Goodies reader put it, "I just got pretty good with HTML, and now they're bringing out this thing." I feel your pain. Believe me, I do. It means that I will need to learn it all first so that I can teach it to you.
     So, here we go. This first tutorial is an introduction to what the heck this XML thing really is.


Okay, What Is It?

     In five words or less you mean? That can't be done. XML are initials for EXtensible Markup Language. Seems strange that the first word is "Extensible," yet they use the "X" to denote the word. My guess is that "XML" looked a lot cooler than "EXL."

Lineage

     You may not know this, but HTML and XML are brother and sister. Their mother is SGML. SGML (Standard Generalized Markup Language) is the overriding language that produced both XML and HTML.
     SGML is not a language per se, but rather a series of commands that are all understood by another program. A similar example would be JavaScript. By itself, there's not much to it, but if you use the JavaScript commands in a particular order and then allow a Web browser to read it, you get some neat effects.


DTD (Document Type Definition)

     SGML is what is known as a "meta-language;" it allows a programmer to write a DTD (Document Type Definition) that numerous pages can follow. For instance, all of you use a word processor (I would assume). You type the letter "j." Something inside of that word processor must understand what you did and display that letter. That's the DTD. You then alter the letter's size, font, and color. Again, the commands the word processor used were all understood and acted upon by using the DTD as a guide.
     HTML uses DTDs. Ever seen one of these at the very top of a page's source code:

<!doctype html public "-//w3c//DTD html 4.0//en">

     That's a document declaration. It states that the DTD to be used is html 3.2 in English. See that above? So, where is the DTD actually located? In the browser. Yup, Netscape, Explorer, Opera, and Mosaic are more than programs to display pretty little pictures. They are actually carrying a DTD so that when you type in a command like <B>, or <UL>, or <CENTER> the browser knows how to handle it. The browser sees the command <B>. It goes to its DTD to check what that command is supposed to do. It sees that the command makes things bold. The effect is then generated.
     The one drawback with a word processor, and HTML, is that you cannot set up your own DTDs. HTML is a very stable format of markup languages. The commands mean the same thing everywhere. The language is easy to learn because it is like playing blocks, to some extent. The tools you use never change. JavaScript is more difficult because you are actually creating the blocks before you play with them.

     Actually, that's the purpose of XML; it allows you to create your own blocks to play with.


Making Your Own Blocks

     Okay, say you want to create a document where a certain type of text is going to be bold, italic, red, 25 point, Arial font, and a few other fancy things. And this type of text appears a great many times. In HTML you would need to write out the start and end tags every time you made the text. Or, you might say, you can set up a style sheet to do that. Well, that's the general idea here. You set up a giant style sheet-type document that acts as "mother" for all the other documents. So, what's the difference? Well, style sheets work inside of HTML documents. You have to create one to use the other. Creating your own DTD eliminates one whole step in the process -- the HTML.

Okay, fine... but what is XML?

     XML is a simplified version of SGML intended to allow people like you and me a pretty good shot at learning it. SGML is wide open. It is a 10,000-piece jigsaw puzzle with double-sided pieces spread all over the floor. XML is the same jigsaw puzzle with big sections already put together.


So, What's Wrong With HTML?

     My personal opinion is that nothing is wrong with it. It was the first computer language that could be understood and used by the masses. It gave the Web to the common person. But those in the XML know claim HTML is clunky. They say it's become static. There's not a lot more one can do with it. Supposedly XML will allow a lot more flexibility in your Web pages. There will also be more flexibility in your HREF links. You'll be able to create cross-references and threads and other fun stuff. At least that's what the brochure says.

     HTML is not dead, nor is it breathing funny. HTML will be around for years to come, if not forever. It is still a solid format and too many people know it. I believe I will be able write HTML and post Web pages as long as I live using HTML alone. They just might not be as fancy as other pages.


Two Kinds of XML Pages

     The two main types of XML pages are the "standalone" and those that use a DTD. The standalone is just what it says: the page stands alone relying on the browser to have the XML DTD. In the XML language, the browser will be the XML processor. The other type of page offers the DTD to the browser so it can run the page.

The Standalone

     The standalone can be created by simply making some alterations to your current HTML document:

  • Lose the current declaration statement and replace it with this one:

    <?XML version="1.0" standalone="yes"?>

  • Remember that XML IS CASE SENSITIVE. If you use caps to start the command, use caps to end the command.
  • This format of caps or no caps must continue fully throughout the document. If you use IMG first, you must continue to capitalize it the rest of the way through or the XML DTD will see it as two different commands.
  • All tags that do not require end tags (like <IMG> or <P>) must now be given one.
  • All tags that did not require an end tag must also be given a slash before the final >

         Like so:   <IMG SRC="pic.jpg" /></IMG>

  • Each subcommand must be surrounded by quotes. (Like: TEXT="brown")
  • Lose all & command and ASCII code characters.
  • Make sure you are running the page in a browser that supports XML.
     If you have followed all these rules, then you have created a document that is termed "well-formed." That means it will run. You see, XML is nowhere near as forgiving as HTML.


Creating the DTD

     As I said above, the second type of page is the one that uses a DTD. Now, in XML, you'll need to set up your own DTD items (which are called "entities" in the business). Each entity will allow you to create your own tag in a traditional HTML format. Entities themselves do nothing. They simply block off sections of the page. Any text that happens to be captured inside of that space will then be affected by the parameters assigned to the entity. Sound familiar? HTML works the same way. For example, say you wanted to create the tag <SUPER> that would make text red and underlined. (As far as I know that one doesn't exist in HTML.) This would be the basic format: (Please understand it is a bit more than this, I am just trying to stay basic at first to keep us all on the same page.)

  • You would create a basic text file with a DTD extension. This file would hold all the entities.
  • In the DTD file you would create an entity like so:

    <!ELEMENT SUPER (#PCDATA|u|ff0000)*>
              <!-- Tag attribute - red and underlined -->

  • Now let's say you save the DTD text document as joe.DTD.
  • Then on the XML document you are writing to display in the browser window, you put up a declaration like so:

    <?xml version="1.0"?>
    <!DOCTYPE SYSTEM "joe.DTD">

  • The SYSTEM command denotes that the DTD can be found on the system running the XML document.
  • Now you are prepared to add the command to the XML document, like so:

    <SUPER>this is the effected text</SUPER>

  • Again, this is a very simplistic offering. If you would like to view some honest and true DTDs, try some of the sites at Yahoo's XML page. There are some rather large examples to be found. It's frightening, to say the least.


Is There More?

     Yupper! In fact there's much more. I don't know how goofy into this stuff you are, but if this is your bag, XML certainly offers some enjoyable reading before bed. There isn't a lot written on the subject (I mean in a relative fashion, like when compared to the number of pages available on the music group Hanson), but what is written is very thick and very technical. It takes some plowing through, but you'll start to see it all come together after spending the time.

     I relied on five main sites when putting this tutorial together:


So, Now What?

Now it's up to you to decide if XML is for you. I have had limited contact with the language because I am more of a Web site designer than production tech. If you're getting into database management and/or cross-page abilities, maybe you should give it a try. If you're simply attempting to make a nice Web site, maybe it will be a bit over your head. You have to decide.

In late 2000, Earthweb (now DICE.com) decided to go to a fully XML format. Many sites melded to the format well and ran smoothly. HTML Goodies was a different story. I've set the site up in a hierarchy format and it just wouldn't fit neatly into XML. Every time we tried to take it live, the number of errors forced us to go back to the HTML format. It was such a pain, in fact, that we decided to scrap the XML version altogether. That didn't sit real well with those who put their hearts and souls into the new site.

 Enjoy!


[Okay, What Is It?]
[DTD (Document Type Definition)]
[Making Your Own Blocks]
[So, What's Wrong With HTML?]
[Two Kinds of XML Pages]
[Creating the DTD]
[Is There More?]
[So, Now What?]


Make a Comment

Loading Comments...

  • Web Development Newsletter Signup

    Invalid email
    You have successfuly registered to our newsletter.
  •