[This is a transcript, courtesy of Karen Mosman, from the lecture on
"What comes first on the Web -- style or structure?" given on February 
16 2006 as part of my PhD Defense. HWL is me, Håkon Wium Lie]


Eric Monteiro: Okay, welcome everyone. Please have seats. As part of
the candidate, Hakon Wium Lie's defence, we will today have two trial
lectures: the first with the a title, of his own choice, and in a hour
he will present his response to the title the committee provided him.
So, please Hakon.

HWL: Thank you so much. Thank you all for coming. We are going to
spend a few hours, if you bear to hang in here, on a topic that is of
interest to many people in the room, I'd say the world expertise on
style sheet is here today so if something goes wrong it's ah, it's ah
bad for the whole field. And I am going to spend the first hour on a
topic I selected myself and then as Erik said on the challenge they
have, have given me. So, given that I was able to pick the topic here
myself, one should think that I would have picked a clear title for it
-- that's not actually the case. There is actually a trick question in
this title. It's an ambiguous title. The ambiguity is in the part
here, "what comes first". "Comes first" can either mean, in English,
first as in importance -- what is important -- or it can mean what
comes first in time. And I will actually discuss both of these
questions. I think they are somewhat interlinked as well.

Before we go on to that part though, there are a few things we need to
establish in terms of terminology, especially the three words of the
title that I have high lighted here. What do we mean by the word
"Web", what do we mean by "style" and what do we mean by "structure".
I expect many of you to have an intuitive understanding of what we
mean by the web for example. That has entered the realm of everyday
use, but I still think it would be interesting to take a look at how
dictionaries currently define the web. I have two offerings here on
the screen, from two online dictionaries. One says, that it is the
complete set of documents residing on all Internet servers that use
the HTTP proctocol, accessible to users via a simple point and click
system. The other says that The World Wide Web is a global information
space which people can read and write via computers connected to the
Internet. I think both of these are actually pretty good. If we were
to expand on the acronyms that are used here, I think certainly http
is one of the foundations. This is one of three specification that TBL
ah came up with in, in the, in around 1990 which is still the
foundation that the web is built on. HTTP stands for hypertext
transfer protocol, um, then there is the URL specification which is
the source of the point and click system that was referred to. There
isn't actually anything that says you have to point and click. You can
have machines -- like Google -- and you can have various other user
interfaces but still the URL is a fundamental specification that the
web relies upon.

The third one of these specifications from the uh 1990 era is HTML,
the hypertext markup language, and that's actually the one that is of
most interest here. I assume most of you are familiar with HTML. It is
the foundation for documents on the web -- it is what web pages are
written in. It's the source language, uh if you will.

And that's also where the term structure comes in. That has to do with
how HTML documents are authored. If you look behind a web page, look
at the source code underlying the HTML page, you will see a lot of
these tags, a lot of these angle brackets, uh and you will see content
mixed in. These angle brackets are what are called tags and what's
between them is the content. The, the angle brackets, the tags make up
what we call a structure and by combining all these tags of various
kind and put them inside each other we actually build up a structure
of documents which can be represented as a tree structure. I know this
is probably basic to many of you but I still think it's worth starting
from the beginning.

The definition of a structured document,which is more generic than
HTML, is proposed by one of the sources of my work "a digitial
document consisting of hierarchical elemenets containing text and
other content. The elements primarily represent roles of the content
rather than the presentation of the content."

And now it starts to get interesting here cause what we saw here in
the previous slides was actually the name of a tag "H1" which stands
for heading one, headline level one if you want. It says something
about the role of the content between the tags, it says that this is a
headline, this string could be anything, I just chosen it to be
headline randomly but the tag "H1" -- and also the other tags like
"body" and "P" -- have a logical meaning rather than a presenation
meaning. They say something about what the role of this headline is in
the document but it doesn't say anything for example that the font
size of the headline should be bigger than the font size of the
paragraph.

And this is what, what HTML coming out of the scientific environment put  
forward as being most important when HTML was released it was a structured  
document format where it said something about the roles, the sematics; it  
was media independent in the sense that all these tags could be presented  
on a computer screen like we're using here or it could be presented on a  
speech synthesiser for example for those who cannot see, or those of us  
who are in the shower.

Then we go on to the other part, the presentational part, this is
where the term style comes in and we shall look at how to define that
as well. I put a little style sheet up on the screen here. It is
written in CSS. It's very simple. It's two statements that describes,
that says something about these tags that we saw before are to be
presented. The H1 tag whenever that is found, the content of that tag
is to have a certain font size and the color of the P elements is to
be black. This is very simple of course. By making many such rules and
combining them in to a style sheet though you can have some
considerable visual effects. This is a sample web page using HTML
(Dave Raggett steps into the room). I have to welcome Dr. Raggett here
who is actually the author of the HTML specifications, many of them.
When we apply a style sheet to that HTML document we get a very
different presentation. This is exactly the same structure we just
added a style sheet to it.

(shows page from CSSZenGarden with Valentine's day style sheet)

This style sheet is not written by me, it is written by a professional
designer -- so are all of the ones I'm going to show here. As you can
see from this one, you can tell which day I prepared my presentation.
It is quite stunning the effects you can achieve just by adding a
style sheet to an HTML document. And designers are starting to realise
this and CSS is increasingly used to drive design on the web.

If we are to define style sheets in an academic form, this is a
definition that I propose myself "A style sheet is a set of rules that
associates stylistic properties and values with structural elements in
a document thereby expressing to present the document". Style sheets
generally do not contain content. They are linkable and are reuseable.
So the whole purpose of the style sheet is to says something about
presentation, it's not gonna put content in there, you would see that
from some of these, some of these added what you arguably could say is
content - these pictures for example, it's arguable whether that is
content or not, but in general this is used in a stylistic manner to
convey a certain style not to really add any textual content. So the
purpose is to describe the presentation to encode typography typically
on a visual medium, to describe aesthetics, about colours, about white
space usage, and in an aural presentation to say something about the
volume or about the type of voice you want to be heard, for example.
So style sheets are general to any kind of presentation you want to
apply to your document.

Having this distinction between structure and style is a fundamental
belief in the area of document study. There are many reasons why style
sheets and structure should be separate. I am not really going to go
into that. That's quite well understood in the scientific community
but I am going to discuss something which I don't think has been often
discussed: what's more important of these two. And to do so I'm
going to go back quite a few years - this is a dark picture and really
the place is quite dark. It's taken in the dungeons of CERN, the
laboratory in Geneva where the Web was invented by Tim. And this is
actually the hallway where he worked, second door on the left was
Tim's office. I happen to be there as well in 1994 for one year. It's
an incredible place with all these physics people, of course, trying
to discover new matter, but there's also quite a few computer people
there. And I think it is quite interesting to see how CERN is built as
a laboratory. It was set up after World War II. As you can see here --
or as you cannot see -- it is quite dark but you know you see all these
corridors, all these doors along the corridor they're lined up you
know systematically on the left hand, on the right side, long
corridors, you have pipelines in the ceiling. This is quite orderly
laid out, but it doesn't look so beautiful.

his being CERN, you really don't know what these pipes have in them
and you probably don't want to know.

(laughter)

But this is a set of pipes that Tim passed by every time he went to
his office so I have this idea perhaps that he was somehow inspired by
these pipes, you know the H1 elements,and H2 elements, you know, in
HTML, H1, H2, H3 and then there's the ones that aren't used and few of
the other ones. I don't know.

(laughter)

Are we allowed to make jokes here Eric? (giggle)

Now if you go up and look around you in that area you find some of the
most beautiful scenery you've ever seen. That's Mont Blanc in the
distance, the city of Geneva in the middle and the countryside, the
French countryside there in the front. It's very beautiful, if you
turn the camera around you have the Jura mountains and some cows
grazing peacefully in the field not knowing that underneath them they
have this incredible machine, the world's biggest machine by the way,
so I think, you know, comparing CERN being sort of the orderly
scientific world, the laboratory, versus the natural beauty above I
think is quite interesting. I am not going to say, you know make too
many conclusions based on this but I do think we need, really need
both. We need the beautiful part of the web, there is an aesthetic
part of the web that is going to amaze us, and we want there to be, at
the same time we want there to be structure. We want Google to find
our documents. We want there to be some kind of processability ny
computers. They're not very good at processing beauty and aesthetics
but they are very good at processing tags. So the quite obvious answer
to the first question "what's more important" is both style and
structure are very important.

The second question, "what comes first in time?" doesn't have really an  
obvious answer but I think it is interesting to see, to discuss it still.   
The question can be reformulated as "what should come first in time?"  
What's better?  What's better for the users?  What's better for the  
designer?  What makes more sense from a rational point of view?  And there  
are two ways to attack this: one is to look at the history of style and  
structure,  to see how mark up languages, style sheet languages how they  
evolved, how they  were developed from around 1980 and onwards or we can  
do a technical analysis and do some thinking what makes more sense.  And  
I'm gonna try to do both here.

Why does all this matter? I think this matters, because we see on the
web that mark up languages are developed all the time and I think by
knowing a little bit, by being conscious of what it takes to develop a
successful mark up language or a successful style sheet language, we
can actually make the web a better place. And some of us actually plan
to spend the rest of our lives there... it better be nice!

So the hypothesis that I will put forward, as I said I don't have an
obvious answer to this, but I do think generally that mark up
languages should be developed in the context of style sheet languages
not the other way around. That's what I will try to argue in this
presentation.

So if we look at the historical analysis I picked a few of the systems
that many people in this room will know and that I also analysed in my
thesis we will see that traditionally, in general, mark up languages
or systems for mark up languages were developed before the style
languages. For example,if we start with SGML which is sort of the
mother of all modern mark up languages we will see that it became an
ISO standard in 1986 but there wasn't a style sheet language to
present SGML documents. So when the Department of Defense chose to use
SGML as the foundations for all their documentation, they had really
no standardised language to present all that content they put in
there. There were some proprietary systems and products that could do
it but they wanted to have a standardised version so they started a
standardisation of what became known as FOSSI which was developed
quickly by committee a few years after 1986 in order to be able to
present SGML documents. For the SGML community though, they didn't
quite, they did like that , they saw that as a quick hack, they wanted
a solution that could do more advanced things. It's quite an academic
flavor. And they wanted for example non-Western layouts, they wanted
to be able to do Japanese vertical writing and so forth. So FOSI
wasn't really accepted. They started instead, the development of DSSSL
which took 10 years to develop and I would say to you there not being
an acceptable style sheet language SGML has seen quite little use. It
has been very influential but when you look at the actual use it isn't
that much.

Then we have a few languages where we had simultaneous development
where the style sheet system and the structured system were developed
simultaneously. One was Scribe done by Brian Ried. The other is
S&P languages by Quint, who is sitting right here, if I may say so
(giggle). I think these languages in some ways suffered by not being
submitted for standardisation. I think if they had been successfully
standardised they could have seen successful use on the web for
example . However, if one were to be able to standardise S&P just
to take an example, you still would only have the foundation for the
markup languages, you wouldn't really have the markup languages, they
would have to come afterwards, which I believe support my hypothesis,
but I would be interested to hear what Vincent has to say about that
afterwards.

Now you also have an example of the opposite where, actually, the
style system was in place before the mark up system. TeX is well known
in scientific communities, developed by Donald Knuth, who had the need
himself. He wrote a book and needed a good system for doing the
typography so he wrote the language and the formatter for doing so. It
is not per se a style sheet language but it is what one could call a
formatting language. Then on top of that, a couple of years later,
came LaTeX which was Leslie Lamport's attempt to encode Scribe in an
open way and LaTeX has been very successful and today even theses are
submitted to the University of Oslo in LaTeX. It is really LaTeX and
Word that are the choices. I hope to change that by the way by now
submitting mine in HTML. So, this being successful also support my
hypothesis I would say due to the style sheet system being in place
first.

We now looked at the historical analysis, we're now going to look at
the technical analysis and try to look at the required components of
these and see what makes sense from a technical point of view and if
we look at the difference trying to see what's different between a
mark up and style sheet system, I think there are two axis, three
maybe, that are very important: one is the processing requirements, in
order to process a mark up language you need a parser and writing a
parser these days is very, very easy. Writing a formatter on the other
hand which is required for a style sheet language is very hard. You
need to think about all sorts of efficiencies, beauty, fonts, screen
resolutions, drivers, etc. which you don't really need to write a
parser. So I would argue that it's significantly harder to write the
style system.

Also, how many people can be part of the development. I think today we
can see that many people can contribute to the development of mark up
languages but to do style sheet languages there are going to be fewer
of them and that means there are going to be fewer developers as well.
So, since, in order to present content you really, really need a style
system that becomes the required component. The style language become
the platform onto which you can develop mark up languages. You can,
it's much easier to do prototyping, for example, of mark up languages
once you have the style system, then you can see output on the screen
whereas developing a new style ...

(Recording stops)