Overall document features

3.1 Overall document features

3.1.1 HTML versus XHTML

An HTML document should identify its language. Currently, there are two choices. There is the familiar HTML (hypertext markup language) and the lesser known XHTML (extensible HTML).

To most designers and developers, the differences seem to be minor. However, if one is given a choice, XHTML is a better choice for the long run. This is because XHTML is actually XML (extensible markup language), which means it can share a magnitude of tools to automate validation and etc.

For accessibility purposes, conformance to XML means more screen readers do a better job reading the content of an XHTML document than an HTML document. Because XHTML has a stricter syntax, it also means a validated XHTML document will not contain malformed structures that an throw a screen reader off.

3.1.2 Document shell

All documents should have a shell that is similar to that in listing 1.

Listing 1:

A simple HTML shell document

1<!DOCTYPE html PUBLIC ”-//W3C//DTD_XHTML_1.0_Strict//EN” ”http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”>
2<html xmlns=”http://www.w3.org/1999/xhtml” dir=”ltr” lang=”en” xml:lang=”en”>
3  <head>
4  </head>
5  <body>
6  </body>
7</html>

This is, indeed, an empty document. However, there are already a few concepts that need to be explained.

Line 1 identifies the document. On this line, HTML identifies the top element of the document. PUBLIC indicates the availability. Use PUBLIC for HTML and XHTML documents.

The quoted string "-//W3C//DTD XHTML 1.0 Strict//EN" deserves a bit more explanation:

-: the dash (minus) symbol indicates whether the organization specified next is ISO (International Standard Organization) registered or not. Since W3C is not, a minus symbol is used. If W3C was registered, then a plus symbol should be used.
//: the slash slash symbol is a separator.
W3C: this is the organization that authored the language (XHTML in this case) specification.
DTD: this is the type (more technically known as the “Public Text Class”) of the document that is specified as the last part of the entire doctype line. DTD means “Document Type Definition”.
XHTML 1.0 Strict: this is the label of the actual mark up language of this document. The technical name of this component is “Public Text Description”.
en: this is the language of the content of the document. en means it is in English.

The last component is an URL that specifies the document type specification.

The “doctype” line is useful for validators, as it specifies exactly what mark up language is used, and where to find the syntax and structural rules (the DTD document).

Besides, the “doctype” line, one can also see that the <html> element has additional attributes:

xmlns=...: this attributes specifies where to find the “namespace” of XHTML. A namespace defines the names of recognized elements and attributes.
dir="ltr": this specifies how the content of the document should be rendered. ltr means left-to-right.
lang="en": this is the older (HTML) content language specification.
xml:lang="en": this is the newer (XHTML) content language specification. The xml: part specifies that name is in the xml name space.

In summary, the shell document in listing 1 tells a browser, validator or screen reader what kind of document this is, and how to read it. By making these specifications as exact and detailed as possible, a screen read or validator can do its job more effectively.

[next] [front] [up]