Converting XML to HTML using XSL
Application Security Testing: An Integral Part of DevOps
originally written by Benoît Marchal
XML concentrates on the structure of the information in a file and not its appearance. To view XML documents we need to format or style them. In practice, this often means converting the XML document to HTML. Here we'll concentrate on XSLT, a subset of XSL.
XSLT is a language used to specify the transformation of XML documents. It takes an XML document and transforms it into another XML document. The HTML conversion is simply a special case of XML transformation
To run the examples in this article, you need an XSL processor - a software component that implements the XSL standard. We'll use LotusXSL (version 0.19.1), which is available at no charge from www.alphaworks.ibm.com. Like most XML tools, LotusXSL is written in Java. Although you don't have to program in Java to use it, you must install either a Java Run-time Environment (JRE) or a Java Development Kit (JDK) on your computer. You can download a Java environment from Sun at java.sun.com.
I publish a monthly e-zine, Pineapplesoft Link. Every month, I email the e-zine to subscribers and I post a copy on my Web site. That's two formats to support - text and HTML.
XML and XSL help because they enable me to write the document in one format (XML) and automatically create distribution copies in text and HTML. And because the styling is applied automatically, it's easy to change the layout of my Web site. All that's required is a change to the style sheet. In a world of changing Web fashions, this is a major advantage.
Here is an abbreviated version of an article from Pineapplesoft Link that discussed XML style sheets. It's formatted in XML and clearly demonstrates the data structure, with different types of data enclosed within different tags.
<title>XML Style Sheets</title>
<copyright>1999, Benoit Marchal</copyright>
<abstract>Style sheets add flexibility to document viewing.</abstract>
<keywords>XML, XSL, style sheet, publishing, web</keywords>
<p>Send comments and suggestions to <url protocol="mailto">email@example.com</url>.</p>
<p>Style sheets are inherited from SGML, an XML ancestor. Style sheets originated in publishing and document management applications. XSL is XML's standard style sheet, see <url>http://www.w3.org/Style</url>.</p>
<title>How XSL Works</title>
<p>An XSL style sheet is a set of rules where each rule specifies how to format certain elements in the document. To continue the example from the previous section, the style sheets have rules for title, paragraphs and keywords.</p>
<p>With XSL, these rules are powerful enough not only to format the document but also to reorganize it, e.g. by moving the title to the front page or extracting the list of keywords. This can lead to exciting applications of XSL outside the realm of traditional publishing. For example, XSL can be used to convert documents between the company-specific markup and a standard one.</p>
<title>The Added Flexibility of Style Sheets</title>
<p>Style sheets are separated from documents. Therefore one document can have more than one style sheet and, conversely, one style sheet can be shared amongst several documents.</p>
<p>This means that a document can be rendered differently depending on the media or the audience. For example, a "managerial" style sheet may present a summary view of a document that highlights key elements but a "clerical" style sheet may display more detailed information.</p>
A Simple Style Sheet
Our goal is to convert the XML document into HTML. The style sheet that follows is an example of how to accomplish this. We'll look at the individual elements in detail later on, but first here's the entire style sheet.
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:template match="abstract | date | keywords | copyright"/>
The style sheet is applied with LotusXSL, as explained previously. From the DOS prompt, change to the document directory and type the following command:
java -classpath c:\lotusxsl_0_19_1\xerces.jar;c:\lotusxsl_0_19_1\lotusxsl.jar com.lotus.xsl.Process -in 19990101_xsl.xml -xsl simple.xsl -out 19990101_xsl.html
NOTE: The LotusXSL processor won't work unless you have installed a Java run-time. If there is an error message similar to "Exception in thread "main" java.lang.NoClassDefFoundError", either the classpath is incorrect (you might have to adapt it) or you typed an incorrect class name for LotusXSL (com.lotus.xsl.Process).
The parameters are self-explanatory: in is the document file (XML file), out is the result file (HTML file), xsl is the XSL file. The HTML parameter forces the processor to respect HTML syntax (for example, <BR> instead of <BR/>).
If everything goes well, there is now a new HTML file, 19990101_xsl.html, in the document directory.
The style sheet is itself an XML document (XSL designers decided that XML was the best syntax for a style sheet). It describes the tree of the source document, the tree of the resulting document, and how to transform one into the other. To confuse matters further, the top-level element is also referred to as stylesheet:
Because the style sheet contains elements from different documents, namespaces (the prefixes before the element names) are used to organize these elements:
The xsl namespace is used for the XSL vocabulary. Its URI must be http://www.w3.org/1999/XSL/Transform The resulting document has another namespace. In this case, the default namespace is attached to HTML 4.0.
Immediately after the xsl:stylesheet element comes the xsl:output element. xsl:output tells the XSL processor that we want to create an HTML document (other options are XML and text).
The bulk of the style sheet is a list of templates. The following code transforms the title of a section into an HTML paragraph with the text in italic.
So the output in our example becomes:
A template has two parts:
The match parameter is a path to the element in the source tree to which the template applies.
The content of the template lists the elements to insert in the resulting tree.
More on Paths
The syntax for XML paths is similar to file paths. XML paths start from the root of the document and list elements along the way. Elements are separated by the "/" character.
The root of the document is "/". The root is a node that sits before the top-level element. It represents the document as a whole.
Here is an example. The following four paths match respectively the title of the article (XML Style Sheets), the keywords of the article, the top-most article element, and all sections in the article. Note that the last path matches several elements in the source tree.
Note also that "/" points to the immediate children of a node. Therefore /article/title selects the main title of the article (XML Style Sheets) but not all the titles below the article element. It won't select the section titles.
To select all the descendants from a node, use the "//" sequence. /article//title selects all the titles in the article. It selects the main title and the section titles.
In the style sheet, most paths don't start at the root. XSL incorporates the notion of a current element. Paths in the match attribute can be relative to the current element.
Again, this is similar to regular file systems. Double-clicking the accessories folder in the c:\program files folder moves to c:\program files\accessories folder, not to c:\accessories.
If the current element is an article, then title matches /article/title but if the current article is a section, title matches one of the /article/section/title.
To match any element, use the wildcard character "*". The path /article/* matches any direct descendant from article, such as title, keywords, and so on.
It is possible to combine paths in a match with the "|" character, such as title | p which matches title or p elements.
Matching on Attributes
Paths can match on attributes, too. The following template applies only to "mailto" URLs.
This gives the following output in our earlier example:
<A href="mailto:firstname.lastname@example.org">[ccc] email@example.com</A>
It matches <url protocol="mailto">firstname.lastname@example.org</url> that has a protocol attribute with the value "mailto", but it does not match <url>http://www.w3.org/Style</url>. The more generic url path matches the later element.
url[@protocol] matches URL elements that have a protocol attribute, no matter what its value is. It matches the <url protocol="http">www.w3.org/Style</url> but it does not match <url>http://www.w3.org/Style</url>.
Following the Processor
Let's follow the XSL processor for the first few templates in the style sheet. After loading the style sheet and the source document, the processor positions itself at the root of the source document. It looks for a template that matches the root and it immediately finds:
Because the root sits before the top-level element, it is ideal for creating the top-level element of the resulting tree. For HTML, this means it creates the HEAD and BODY tags.
When it encounters xsl:appy-templates, the processor moves to the first child of the current node. The first child of the root is the top-level element or the article element.
The style sheet defines no templates for article but can match template against a built-in template. Built-in templates are not defined in the style sheet. They are predefined by the processor.
<xsl:template match="* | /">
The built-in template forces the processor to load the first children of article, that is, the title element. The following template matches:
Note that the processor matches on a relative path because the current node is article. It creates a paragraph in the HTML document. xsl:apply-templates loads title's children.
The first and only child of title is a text node. The style sheet has no rule to match text but there is another built-in template that copies the text in the resulting tree.
The title's text has no children so the processor cannot go to the next level. It backtracks to the article element and moves to the next child: the date element. This element matches the last template.
<xsl:template match="abstract | date | keywords | copyright"/>
This template generates no output in the resulting tree and stops processing for the current element.
The processor backtracks again to article and processes its other children: copyright, abstract, keywords, and section. Copyright, abstract, and keywords match the same rule as abstract and generate no output in the resulting tree.
But the subsequent section element matches the default template and so the processor moves to its children, title, and p elements. The processor continues to match rules with nodes until it has exhausted all the nodes in the original document.
Creating Nodes in the Resulting Tree
Sometimes it is useful to compute the value or the name of new nodes. The following template creates an HTML anchor element that points to the URL. The anchor has two attributes. The first one, TARGET is specified directly in the template. However the processor computes the second attribute, HREF, when it applies the rule.
Which gives the following output in our earlier example:
<A target="_blank" href="http://www.w3.org/Style">[ccc] http://www.w3.org/Style</A>
XML lets web designers organize a document by structure, so that changing a document's appearance is a simple matter of changing the definition of an element once, and letting the changes ripple through an entire file - or a huge Web site.
In the future, more people will turn to specialized devices to view the Web. Already WebTV has achieved some success. Mobile phones and PDAs, such as the popular PalmPilot, will be increasingly used for Web browsing.
The way pages are displayed has to be changed for these smaller devices. One solution may be to use XHTML, an XML simplified version of HTML.
XSL will make it easy to manage the diversity of browsers and platforms by maintaining the document source in XML and converting to the appropriate XHTML subset with XSLT.
This article originally appeared on WebDevelopersJournal.com.
IT Solutions Builder TOP IT RESOURCES TO MOVE YOUR BUSINESS FORWARD
Which topic are you interested in?
What is your company size?
What is your job title?
What is your job function?
Searching our resource database to find your matches...