Rhope Blog

Blog Home Downloads Documentation RSS

The Problem with HTML

by Mike Pavone

Putting together the simple Rhope web framework that powers this site got me thinking a bit about web technologies and HTML in particular. The promise of the HTML/CSS combo is a nice one. Stick your content in a semantic container (HTML) and then let a completely separate file handle presenting this semantically arranged data in a pleasing fashion (CSS). In theory, such a clean separation of content and presentation should bring a host of benefits. Skinning an HTML document or web based application should just be as simple as modifying the style sheet. Devices with small displays should be able to ignore the stylesheet and still display the page sensibly. Software should be able to easily parse the HTML and do meaningful things with it without any fuss. Viewers that present web pages in an innovative way should be easy to create. Somehow it seems that the reality of HTML and CSS fars fall short of the theoretical promise of the concept.

Why is that? One of the main obstacles I see is that most websites today are not technical manuals. I'm serious. Okay, perhaps I stated my point in a slightly misleading fashion. It might be better to say HTML defines a rather rich set of semantic tags if the document you're trying to represent in it happens to be a technical manual. Just take a look at a list of all the tags available in HTML 4. There are tags for source code snippets, sample output, variables, acronyms, abbreviations, definitions of a list of terms and more. All of these can be quite useful for technical documents, but most of those tags I mentioned are rarely used on modern websites (the code tag being a notable exception). There's no tag to mark something up as a blog post or perhaps something more generic like a news item. There's no tag for denote a piece of text as a user comment. You can't even clearly differentiate between the main body of a page and a navbar with HTML's vocabulary.

If HTML's semantic tags were a better match for today's websites, we might not need a separate format for news feeds. A feed reader would just parse the main page of the blog or news site and pull out the elements that were necessary. CSS heavy pages wouldn't have to turn into an unusable mess when the stylesheet doesn't load if HTML made it easy for browsers to differentiate between the "meat" of a page and sections devoted to things like navigation. With those same semantic features, mobile browsers could more easily reformat pages for small screens. The list goes on.

This isn't the only problem though. The appeal of being able to properly separate content and presentation is pretty strong even if the content half uses a crappy vocabulary. Unfortunately, the HTML/CSS combo doesn't seem to deliver too well on this front either. The multitude of HTML templating systems alone would seem to be indicative of a problem here. If CSS was enough for presentation, you would think that web applications would just spit out HTML in a format that was arranged logically and CSS would take care of all the skinning.

One of the biggest problems I see with CSS is that too much of it assumes a more or less linear document. So elements that follow each other logically in the HTML document tend to follow each other visually in the final output. Such an assumption isn't always a bad thing. In running text, this is exactly the behavior you want so if your document is mostly made up of running text (like say technical documentation) CSS does a great job. If your document has a mix of running text and other elements or several loosely related bocks of running text (like on a fair number of modern websites) it does a sort of okay job. If most of your content isn't running text (like say in a web application) it does a pretty crappy job.

Now CSS doesn't completely leave you in the lurch when you want to break out of the linear flow assumption. The position and float properties give you some flexibility with breaking elements out of the linear flow. Things like sidebars and page headers aren't too hard to do with CSS. There are certain cases of those kind of elements that can be a little awkward, but overall the situation isn't too bad. However, if you want to re-arrange things within the flow you're going to have problems. CSS can't even do something as simple as displaying the children of a block element in the reverse order than they appear in the original HTML.

Another problem I see in CSS is that each element is an island in relation to its peers for the most part. One of the things that made HTML tables attractive for layout is that the size and position of a single table cell took into account the cells around it as well. Eventually, the ability to make arbitrary elements behave like tables, table rows and table cells was added to CSS, but it's an all or nothing affair (and it's also not supported in any currently shipping version of IE). Either you get all of the formatting features of tables or none of them. You can't say "these two divs shouldn't wrap if the viewport is narrow, but otherwise treat them like normal block elements". Then there are scenarios that tables didn't handle well and CSS also blissfully ignores. For instance, suppose you want two elements side by side and you want one of them to have a fixed width and the other to take up some percentage of what's left over in the viewport.

There are some other problems I see in the HTML/CSS combo, but they are less pertinent to the original point. So what's an author of web content or framework designer to do? I think we need to stop writing HTML by hand. To a certain extent, this is already happened. So called "data-driven" websites make up a significant chunk of the web today, but I think we need to go further. We need to stop writing HTML templates too. I think web frameworks need to start abstracting HTML and present the applications that use them with a logical semantic model of the page that doesn't necessarily have a 1:1 relationship with the HTML that gets generated as a result. The HTML that gets generated also needs to be as CSS friendly as possible and where CSS doesn't provide enough flexibility the framework needs to provide a mechanism to re-arrange the output without resorting to HTML-filled templates or modifying the code of the application using the framework. I'm not 100% sure what such a framework would look like, but I think it's going to be my goal as I work on Rhope's web framework.

Username: Password:
Don't have an account? Register now!