4 XML, the content language
Alright, HTML is not the perfect language to separate content from presentation. However, it is widely used and supported
by all browsers. What do we do, then, to separate content from presentation?
XML (eXtensible Markup Language) is the answer. XML is a content language with no presentation specification
whatsoever. It also has very little predefined tags. As such, XML does not work on its own. The following is a list of file types
that are related to XML:
- DTD (Document Type Definition): this is a specification of the structure (syntax/grammar) of a particular XML-based
document type.
- For example, for a resume document type, it may specify that the main document includes a section
about the applicant’s contact information, a section about the applicant’s previous work history, a section
about the applicant’s education background and etc. Within the education background, it may include
any number of college/university duration.
- Basically, a DTD is like an empty form that defines the structure of information that a filled form may
contain. Note that the form is used only for information gathering and structuring, not for presentation.
- XML (eXtensible Markup Language): an XML document is equivalent to to a filled form. As such, an XML document
always needs to have a reference to the DTD that specifies the proper structure.
- In our example about resumes, an XML document is like a filled form with all the information fields for a
resume. You can see an XML document (in this example) as “the raw information necessary to generate
a resume, conforming to the rules specified by the resume DTD.”
- The content of an XML document may not fully conform to the rules specified by the referenced DTD!
The enforcement is handled by an XML parser and validator.
- XSL (XML StyLe Sheet): an XSL for a particular DTD specifies a set of rules to “transform” the raw
information contained in an XML into another document. There is no limitation to what the output
document may be. For example, the output may be an HTML document for web browsers, or it may be
an RTF document for Microsoft Word, or it may be another XML document that references another
DTD!
- In our example, most employers don’t want to read the XML file of a resume. Although an XML file
is just a plain text file (like HTML), it is rather user unfriendly. As a result, an applicant uses an XSL
to transform the XML file into a presentable form. A common target document type is HTML for web
distribution. There can be different XSLs for the same DTD for different effects and styles. For example,
one XSL may be designed for a “classic” look, another for a “modern” look, yet another one optimized
for OCR (optical character recognition). And, yes, there can be one just for accessibility!
- An XSLT (XSL transformation) engine takes an XML document (with a reference to its DTD) and an
XSL, and generates the target document.
- The language of XSL is somewhat like a programming language. It can specify logical operations. However,
it is not a general purpose programming language like PHP or Perl, as it is optimized to the analysis
(breaking apart) of a source XML document and the reassembling of information pieces in the destination
document.