[The original of this article can be found at http://www.let.rug.nl/~bert/stylesheets.html.]

In two earlier articles I've tried to find out the requirements for a style sheet language for (simple) SGML, in terms of what (meta-) information is present in an SGML document (see `addressing' or [another copy]) and what model of the formatting process would be both practical and intuitive (see `model' or [another copy]). In the present article I would like to outline a set of formatting parameters and an accompanying syntax.

Stream-based Style sheet Proposal

Other proposals

Before I can introduce my own syntax, I will have to dismiss :-) earlier proposals. (Also see HTML Style Sheets)

  1. Robert Raisch's proposal of June 1993. Except for minor details, this would meet my requirements, but I think the syntax can be improved.
  2. Joe English' proposal using SGML syntax. (Already withdrawn by the author.) Everything can be expressed in SGML syntax, but SGML isn't necessarily the best format for everything.
  3. DSSSL & DSSSL Lite by James Clark e.a.. I'm sure DSSSL (or DSSSL Lite) will find its place in the formatting of complex SGML, despite its shortcomings, since there simply isn't anything else that's standardized. But for on the fly (stream-based) formatting, it's unusable. As long as the on-screen representation of a document maintains (more or less) the order of the elements in the SGML file, there is no need for the goal-based model of DSSSL.
  4. Haakon Lie's cascading style sheets. This proposal again follows the same route as 1 and 2 and thereby meets the requirements I set. The syntax is very close to what I will use below. The ability for the client to override style sheets in a controlled manner is very appealing, but unfortunately not as simple as this proposal supposes. Nevertheless, context-dependent or even weighted overrides are worth investigating further.
  5. `HTML to the Max' by C.M. Sperberg-McQueen and Robert Goldstein. This `manifesto' contains many sensible words about the deployment of general SGML on the WEB. For a style sheet language to be practical, the SGML document must be in the canonical form towards which the authors have made a start. I guess this defines the border between documents requiring DSSSL and those that can be formatted `on the fly': the SGML must be in canonical form and the elements must be in the right order.


First a note about the syntax I adopted: it is simply the syntax used by X resource files. The advantages are as follows:

  1. At least under X, all the routines for parsing and string-to-whatever conversions are already there.
  2. It supports the addressing of elements as direct children or descendants of other elements.
  3. Except for a few subtleties with inheritance and priorities among general and specific specs, the syntax is straightforward, very readable and easy to write by hand.

The subtleties mentioned above involve specifications with wild cards. There are well-defined rules for them, but a few cases they may be somewhat surprising. With inheritance I mean the rule that elements in the SGML document inherit most of the style properties of the enclosing element. This must be distinguished from the properties that are shared because they have a wild card in them.

In general all element names are in uppercase, attribute names are also in uppercase and preceded by `!', ID's are preceded by an `@', property names consist of lowercase letters. In certain contexts, property names can be preceded by `$'. Example: HTML.BODY.DISPLAY.P.EM.slant.

All of the properties can have an explicit value, an attribute reference (with `!'), or a reference to another property (indicated by an initial `$'). An attribute or property is replaced by its value. For a property, this is the value of the property in the enclosing element (otherwise the order of evaluation would matter). If a property has an illegal value, the property is regarded as not being present.

The value may also be a built-in function. The only function defined so far is `@ifmatch(A, B, C, D)', where A is the name of an attribute (with `!') or property (with `$'), B is a regular expression, C is the value of the property if A matches B, D is the value otherwise. D may be omitted.


Below is a list of style properties. I've tried to make the list as comprehensive as possible, but I'm sure there are omissions or things that are better expressed in different ways. For each property there is a short explanation of the semantics and the way it is inherited in sub-elements. If a property is inherited, it means that the final value of the property is inherited. E.g., if a property is given as `*FOO.justify: !ALIGN' and attribute ALIGN of FOO has the value `left', then all sub-elements of FOO are left aligned, it doesn't matter if they have an ALIGN attribute of their own.

Vertical space is, unless explicitly stated otherwise, expressed in units equal to the normal line height of the default (initial) font. Thus, these heights are independent of the actual font size or leading. For X, this line height would be the sum of the font's overall ascent and descent.

Horizontal space is measured in ems of the default font. For an X font, this would be the value of the QUAD_WIDTH property (if indeed the default font has one, otherwise a suitable estimate).

(Boolean) Indicates that the element has no end tag. The formatter will in effect open the element and then immediately insert a virtual end tag and execute that. Not inherited (obviously:-)).
Font size, absolute or as an increment to the inherited size, the two are distiguished by the presence/absence of an initial `+' or `-'. Example: assume initial size is 0; if <SUP> has property `*SUP.size: -2' it would be -2 after <SUP>; after another <SUP>; it would be -4. Of course, browsers only have a limited set of sizes and -4 may well turn out the same as -2. This property is not inherited (but the resulting cumulative size is!)
Font family. There are four families: normal, alt, tt, and sym. `normal' is useful for running text. `alt', if different from `normal', is suitable for titles. `tt' is a fixed width font. `sym' is for special symbols that are not in any character set (called WWW-icons, such as folder and audio). Inherited.
Font family. This is the name of a specific family, such as Bembo, Gill Sans, Univers, Garamond, etc. Takes precedence over `family', but only if the browser is able to provide the font. Inherited.
A number selecting the level of emphasis. Browsers are free to implement this any way they like, but they must support at least levels 0 (of course!), 1, and 2. Inherited.
(Boolean) Select oblique or italic font. Overrides `emphasis'. Note: there is no way to select slanted rather than italic as in TEX, is this needed? Inherited.
(Boolean) Select bold font. Overrides `emphasis'. Inherited.
A number giving the number of lines under the text. Overrides `emphasis'. Inherited.
(Boolean) a line through the text. Inherited.
Colour of foreground, using X specifications. Inherited.
Colour of background. The value `transparent' is also allowed. Inherited.
Extra vertical space to put between lines, relative to the default line height of the actual font in the element, Thus, 1.0 means double-spaced lines, whatever the current font size. Inherited.
Amount to raise (or lower, if negative) the text above the baseline. The value can be ...-2, -1, 0, 1, 2,..., to indicate the positions (The exact positions in pixels are a property of the font.) This property automatically selects an appropriately smaller font size, so no `size' is needed (unless one wants an even smaller font). Inherited.
(Floating point number) Minimum amount of whitespace above the element. (Implies a paragraph break.) The whitespace is not added to the vertical space already there, but the space becomes the maximum of the existing space and the new. Example: `*OL.prebreak: 1.0' and `*LI.prebreak: 0.5' would cause the first <LI> to be 1.0 line (maximum of 1.0 and 0.5) below the previous paragraph. Not inherited.
Minimum amount of whitespace below the element. Not inherited.
(Floating point number) The presence of this property causes a horizontal rule to be inserted above the element, followed by the given amount of whitespace. The rule appears after any `prebreak'. Not inherited.
Insert the given amount of whitespace and a horizontal rule below the element, but before any `postbreak' (implies a paragraph break). Not inherited.
Thickness of the rule, in line heights. Only meaningful if `rulebefore' and/or `ruleafter' are present. Inherited.
(Floating point number) Increment to the current left margin. Implies a paragraph break. Not inherited (but the cumulative margin is!).
Increment to the right margin. Not inherited.
Width of the paragraph in ems. This implies a paragraph break and overrides `rightindent'. Not inherited.
Alignment mode for the paragraph. The presence of this property implies a paragraph break). Possible values are `left', `right', `full' and `center'. Default value is `full'. Inherited (but not the implied paragraph break).
There are three tracks: the main one, one on the left and one on the right. Text (or images) can be `floated' against the left or right margins, causing the main track to flow around it. Values for this property are: left, right, normal. Implies a paragraph break. Inherited (except for the implied paragraph break).
(Floating point) Indentation of first line. Inherited.
(Boolean) Suppresses `parindent' on the next paragraph after this element. Not inherited.
Start a new paragraph with a label sticking out into the left margin. Values are: A, a, 1, I, i, bullet, square, -, *, names of symbols (resp. auto-numbering uppercase letters, lowercase letters, Arabic numbers, Roman numerals, lowercase Roman numerals, bullets, squares, dashes, asterisks, WWW-icons). Not inherited.
(Boolean). Text of element is not displayed. Inherited.
(Boolean). Text of element is added to document's title. Inherited.
URL of something to display in-line at the start of the element. Not inherited.
Extra space to add above and below an inline object. Only useful in combination with inline. Inherited.
Extra space to add left and right of an inline object. Only useful in combination with inline. Inherited.
(Boolean) A click on this element should generate a URL that includes the coordinates of the click, relative to the upper left corner of the in-line illustration. Only useful in combination with inline and anchor. Inherited.
Vertical alignment of an inline object. Possible values: top, bottom, middle. Not inherited.
Height of an inline object in pixels. Takes priority over whatever the inline object itself suggests for a height. Not inherited.
Depth below the baseline of an inline object, in pixels. This overrides `valign'. Not inherited.
Width of an inline object in pixels. Only meaningful in combination with `inline'. Not inherited.
(Boolean) The element is replaced by a marker or button, which, when pressed, unfolds into the contents of the element. Depending on the browser, this may be implemented as a pop-up box, a new window, or expanding text. Not inherited.
Possible values: top, bottom, left, right. The contents of the element should be rendered as a caption and positioned in the indicated position. The object which it is a caption for can be found by looking for an `inline' or `table' property on the element itself or its ancestors. Not inherited.
The (unique) ID of this element. This value will nearly always be an attribute reference, such as `!ID'. Presence of this property will cause the formatter to look for additional property specifications starting with this ID. Example: if `*id: !ID' works out, after resolving the attribute, as `*id: p101', the formatter will look for `@p101.size', `@p101.family', etc. Not inherited.
(Boolean). Spaces and line breaks are not removed. Spaces have fixed width, REs (record ends) cause a line break in the text. If the current value of `justify' is `full', it is treated as `left' instead.
(Boolean) Don't break lines. If combined with obeyspaces, REs are treated as single spaces. Inherited.
Text to insert just before the text of the element, after any `parindent'. Not inherited.
Text to insert after the text of the element. Not inherited.
Causes enough vertical space to be inserted to end up below any floating figure in the left, right, or both side tracks. Implies a paragraph break. Possible values: left, right, full. Not inherited.
Contains the URL of which this element is the source anchor. Inherited
(Only meaningful in combination with `anchor') Indicates that the anchor is a hotspot in the inline image of the element or one of its ancestors. Possible values: rectangle, circle, polygon. Not inherited.
Coordinates of the hotspot. Only useful in combination with `anchorshape' and the interpretation depends on the value of that property. Not inherited.
(Boolean) Indicates that the element constitutes a table. Not inherited.
(Boolean) Indicates that the element is a table row. It is possible for an element to be both a `table' and a `tablerow'. If neither the element itself nor one of its ancestors has the `table' property, `tablerow' has no meaning. Not inherited.
(Boolean) Indicates that the element forms one or more cells in a table. An element can be both a `tablecell' and a `tablerow'. If neither the element itself nor one of its ancestors has the `tablerow' property, `tablecell' has no meaning. Not inherited.
The table cell spans the indicated number of rows. Only meaningful in combination with `tablecell'. If not present, 1 is assumed. Not inherited.
The table cell spans the indicated number of columns. Only meaningful in combination with `tablecell'. If not present, 1 is assumed. Not inherited.
Lines around the table cell. Possible values: any sequence of zero or more words from `left', `right', `top', `bottom', `border'. `border' is equivalent to `left right top bottom'. Inherited (but only results in any visible effect in elements with the `tablecell' property). Default is no frame. Not inherited.
(Boolean) the text in this element may be hyphenated at line breaks. Initial value is true. Inherited.
(ISO code for a language) This may influence the hyphenation and maybe some other things (such as the style sheet itself). Default is US English. Inherited.
Target of a hyperlink (the fragment-ID of a URL). Not inherited.
The contents of the element are a style sheet, applicable to the rest of the document. Possible values are: `merge', `replace' and `override'. Any other value means that the contents are not a style sheet. `Merge' means that the style rules are added to the existing style, with the existing style taking precedence in case of conflict. `Override' also adds the two together, but gives precedence to the new rules. `Replace' completely removes the old style sheet and uses the new one instead. Inherited.


Here is part of a style sheet for HTML:

! Style sheet for HTML documents
! -----------------------------

*id: !ID
*language: !LANG
*target: !ID

! Create a default margin of 3 em
HTML.leftindent: 3.0
HTML.justify: full

! <H1> is bold, centered, 2 sizes larger than the surrounding text,
! with lines above and below.
*H1.size: 2
*H1.bold: true
*H1.justify: center
*H1.rulebefore: 1.0
*H1.ruleafter: 1.0
*H1.prebreak: 2.0
*H1.postbreak: 1.0
*H1.noindent: true

! <H2> is bold, 1 size larger, sticking out into the margin,
! not justified and it has a line above it
*H2.size: 1
*H2.bold: true
*H2.justify: left
*H2.rulebefore: 0.3
*H2.prebreak: 1.0
*H2.postbreak: 1.0
*H2.leftindent: -3.0
*H2.parindent: 0.0
*H2.noindent: true

! <H3> is italic, 1 size larger, in left margin
*H3.size: 1
*H3.slant: true
*H3.justify: left
*H3.prebreak: 1.0
*H3.postbreak: 1.0
*H3.leftindent: -3.0
*H3.parindent: 0.0
*H3.noindent: true

! <H4> is bold
*H4.bold: true
*H4.justify: left
*H4.prebreak: 1.0
*H4.postbreak: 0.5
*H4.parindent: 0.0
*H4.noindent: true

! <H5> is bold and run-in (i.e., no postbreak)
*H5.bold: true
*H5.prebreak: 1.0
*H5.parindent: 0.0

! <H6> is italic and run-in
*H6.slant: true
*H6.prebreak: 1.0
*H6.parindent: 0.0

! <P>. Note absence of prebreak: because H5 and H6 are run-in
*P.parindent: 2.0
*P.postbreak: 0.0

! Various character-level elements
*U.underscore: 1
*S.strikeout: true
*TT.family: tt
*B.bold: true
*I.slant: true
*BIG.size: 1
*SMALL.size: -1
*EM.emphasis: 1
*STRONG.emphasis: 2
*CODE.family: tt
*SAMP.family: tt
*KBD.family: tt
*KDB.underscore: 1
*VAR.slant: true
*CITE.slant: true
*Q.insertbefore: `
*Q.insertafter: '

! <BR> forcedly break a line. Flush as in Netscape
*BR.empty: true
*BR.prebreak: 0.0
*BR.flush: !CLEAN
*BR.noindent: True

! <WBR> is another un-SGML-like element from Netscape
! (&sbsp; doesn't exist, I've made it up, cf. &shy;)
*WBR.empty: True
*WBR.insertbefore: &sbsp;

! <A> is a hyperlink
*A.textcolor: blue
*A.underscore: 1
*A.anchor: !HREF
*A.target: !NAME

! <IMG> in-line or floating illustration (Note double use of ALIGN)
*IMG.empty: true
*IMG.inline: !SRC
*IMG.valign: !ALIGN
*IMG.track: !ALIGN
*IMG.ismap: @ifmatch(!ISMAP, "ISMAP", true, false)
*IMG.width: !WIDTH
*IMG.height: !HEIGHT
! if images are not displayed:
! *IMG.insertbefore: !ALT

! <HR> horizontal rule
*HR.empty: true
*HR.prebreak: 0.5
*HR.rulebefore: 0.0
*HR.postbreak: 0.0

! <PRE> preformatted text
*PRE.prebreak: 0.5
*PRE.postbreak: 0.5
*PRE.family: tt
*PRE.justify: left
*PRE.width: !WIDTH
*PRE.obeyspaces: true

! <DL>
*DL.prebreak: 1.0
*DL.postbreak: 1.0
*DL.leftindent: 2.0

! <DT>
*DT.prebreak: 0.5
*DT.parindent: -2.0
*DT.bold: true
*DT.insertafter: \ 

! <DD>
*DD.postbreak: 0.5

! <OL>
*OL.prebreak: 1.0
*OL.postbreak: 1.0
*OL.leftindent: 2.0

! <UL>
*UL.prebreak: 1.0
*UL.postbreak: 1.0
*UL.leftindent: 2.0

! <OL> has numbered items <LI>: 1.A.I, bullets elsewhere
*LI.prebreak: 0.5
*LI.postbreak: 0.5
*OL.LI.label: 1
*OL*OL.LI.label: A
*OL*OL*OL.LI.label: I
*LI.label: bullet


Linking to style sheets

Some problems remain: how are style sheets associated with documents? This is outside the scope of this article, but I would like to offer some possibilities:

  1. In the LINK tag of HTML. This is unsatisfactory for several reasons: (1) it is too late, the document has already started before the link is found; (2) it doesn't work for non-HTML.
  2. In a new header line of the HTTP protocol. This is better, but it relies on HTTP being used.
  3. As part of a MIME/multipart document.
  4. In the URL. A bad idea, not only because the style doesn't really `belong' to the document, but also because the URL would become too long.
  5. The other way round: a hyperlink contains not the URL of the document, but of its style sheet, which in turn references the document (in a new `document' property).
  6. As an attribute of A: <A HREF="doc.html" STYLE="doc.sty">

Conditional overrides

As said earlier, there must be some way for the user to selectively override part of a style sheet. To keep the rest of the style sheet intact, it may be necessary to introduce conditionals into the syntax:

! Effect of `delay image loading' on FIG element
*FIG.inline: !SRC
*FIG.hide: true
delay_images*FIG.hide: false
! Indentation and justification made dependent on window width
*DL.leftindent: 3.0
narrow*DL.leftindent: 1.0
wide*DL.leftindent: 5.0
narrow*P.justify: left
! Use colours only if on colour screen
*A.textcolor: red
*A.textbackground: yellow
b&w*A.textcolor: white
b&w*A.textbackground: black
monochrome*A.textcolor: black
monochrome*A.textbackground: gray80

Other output devices

A somewhat related question is: how can the user (or the browser) select the right style for a particular purpose, such as printing, on-line display, speech synthesis. These different purposes could be encoded in the style language (allowing all style variants to be in the same style sheet) or they could be handled externally, possibly in the way the style sheet is linked from a document.

Restricted SGML

If a style sheet is to work as intended, the browser must ensure that the proper nesting of elements is maintained. The parser must insert missing elements. A formatter that doesn't maintain a parsing context based on the HTML DTD and only formats the elements that are named explicitly in the text, will give strange results.

If the document is not coded in HTML, but in some other SGML format, the formatter has to assume that all tags are present, except perhaps for a few closing tags that can be inferred from the rule that elements must be properly nested. This is the only rule that the parser can use.

Explicit lengths

Dimensions have implicit units (ems, line heights, pixels) depending on context. Maybe a notation with explicit units should be allowed as well.

Missing properties

There are still some properties missing:

Last updated: 31 March 1995, Bert Bos