XHTML in layman's terms

Needing Closure

Everything that goes up must come ____.

You know how that sentence is supposed to end. It does not conclude properly though. Instead, there are just four blanks. It is supposed to end with a four letter word (not that one), yet it just drops off. Even though it's an age-old saying, we are left in a state of suspense, and have to draw our own conclusions. It takes intuition on the part of the reader in order to properly finish the sentence.

Let's go ahead and do that, so the OCD people out there can relax: "Everything that goes up must come down." There, that's better right? No more guess-work, just a simple statement, with a proper beginning and end. It simply wouldn't make sense if something was thrown into the air, and it didn't eventually land again. This not only defies gravity, but it bothers our sense of logic as well.

This is not unlike how XHTML differs from HTML 4.01. In XHTML, everything that has a beginning must have an end. In HTML, this is not the case. You can start a paragraph, but never finish it. You can have a list of things, but never actually bring each item to completion. If we had to communicate with someone who never finished sentences, it would be confusing to say the least.

Think of it this way: HTML was meant to be like a candy wrapper, with a twist at each end. However, even the strict specification is somewhat loose, meaning the candy wrapper need not be closed. Now, I don't know about you, but I'm certainly more likely to eat a Snickers that's completely sealed than one that's been sitting open for who knows how long.

Examples

Below is a code example of some ugly looking HTML that is valid, even under the 4.01 Strict Document Type:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Strict//EN" "https://w3.org/TR/html4/strict.dtd">
<html>
	<head>
		<meta http-equiv="Content-TyPe" content="text/html; CHARSET=utf-8" />
		<title>Testing HTML 4.01</title>
	</head>
	<body>
		<p>This is some sloppy HTML 4.01! </p
		><ol>
			<li>List item one </li
			><li>List item two </li
			><li>List item three</li>
		</ol>
	</body>
</html>

Comparatively, here is the rewrite of how it is required in XHTML 1.0 Strict:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "https://w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-us" lang="en-us">
	<head>
		<meta http-equiv="content-type" content="text/html; charset=utf-8" />
		<title>Testing XHTML 1.0 Strict</title>
	</head>
	<body>
		<p>This is some clean XHTML 1.0 Strict!</p>
		<ol>
			<li>List item one</li>
			<li>List item two</li>
			<li>List item three</li>
		</ol>
	</body>
</html>

As you can see, the second example is more readable (assuming you know HTML syntax). Everything (aside from Doctype) is lowercase, and everything that is begun is properly finished, and in the right order. This is not only easier to pick through, it saves a browser the guess-work of where each closing tag is supposed to go. For instance, an open <li> can be closed at the end of the line, but it's also possible to nest things within list items.

A browser doesn't know until it reaches the next opening tag whether it should close the previous tag or leave it open. So, there is confusion not just for a person reading the code, but for the actual rendering of the page. Implicitly, this extra necessary logic causes longer loading times, despite the extra bytes it may take to type a closing tag each time. Clean code renders faster, period. For more evidence on that, check out this article (not the site's code).

Self-Terminate:

At the end of Terminator 2, Arnold says "I cannot self-terminate." In the case of XHTML though, it's a good thing. If everything that starts needs to have a finish, then there are a few spots in which HTML, even when well-formed, has a few problems. In the example above, there is one such spot in the meta describing content type.

In HTML, it ends with a ">" meaning that technically it would still be open. XHTML fixes this by self-terminating the line with a trailing slash "/>". Other such tags include img, br, and hr - all typically left "open" in HTML, that are now self-terminating in XHTML. This makes it much easier to read for me personally, knowing everytime there's a trailing slash, it's the end of the tag.

Summary

So, laying aside the whole argument over whether to serve it as text/html or application/xhtml+xml, I prefer XHTML because of the inherent rigidity required for it to validate. This of course isn't to say that HTML can't be written with the same standard in mind, it's just not required. Make sure you don't get me wrong, or think I'm advocating XHTML the de facto way.

This isn't some sort of elitist attitude. I certainly acknowledge the fact that HTML can be just as semantic and structured as XHTML. Roger Johansson and Robert Nyman are experts in web development, and both make great use of HTML 4.01 Strict. They both properly close everything, and make use of lowercase code. Robert even wrote an article about it here. Roger is clearly passionate about proper closing of tags, as seen in his article here.

Doug Bowman, another legend in the field, makes use of XHTML 1.0 Strict, but serves it as text/html, no big deal. I can see the logic on each side of the arguement in wanting to cater to older browsers, etc. If you've visited my site in IE, you've no doubt seen my attempt at humor within conditional tags, and so I reveal some of my biases against Microsoft there.

So, let's get to the point. Have we solved any great mysteries here today? No. Is this article to prove that one Doctype is better than another? No. I just wanted to give a brief overview of what drew me to XHTML initially, and why I still prefer to use it. The basic point it, no matter what you choose, stick to it. Don't mix and match, and please keep tags lowercased and terminated.