A Brief History of the Development of SGML

(C)1990 SGML Users' Group (11 June 90)

Permission to reprint is granted provided that no changes are made, and provided this notice is included in all copies.

SGML, in its present form, is the result of the efforts of many people, channelled into four major activities that occurred over the past twenty years: generic coding, the GML and SGML languages, the SGML standard, and major SGML applications.

1. The generic coding concept

Historically, electronic manuscripts contained control codes or macros that caused the document to be formatted in a particular way ('specific coding'). In contrast, generic coding, which began in the late 1960s, uses descriptive tags (for example, 'heading', rather than 'format-17'). Many credit the start of the generic coding movement to a presentation made by William Tunnicliffe, chairman of the Graphic Communications Association (GCA) Composition Committee, during a meeting at the Canadian Government Printing Office in September 1967: his topic -- the separation of the information content of documents from their format.

Also in the late 1960s, a New York book designer named Stanley Rice proposed the idea of a universal catalog of parameterized 'editorial structure' tags. Norman Scharpf, director of the GCA, recognized the significance of these trends, and established a generic coding project in the Composition Committee.

The committee developed the 'GenCode(R) concept', recognizing that different generic codes were needed for different kinds of documents, and that smaller documents could be incorporated as elements of larger ones. The project evolved into the GenCode Committee, which later played an instrumental role in the development of the SGML standard.

2. GML and SGML: languages for generic coding

In 1969, Charles Goldfarb was leading an IBM research project on integrated law office information systems. Together with Edward Mosher and Raymond Lorie he invented the Generalized Markup Language (GML) as a means of allowing the text editing, formatting, and information retrieval subsystems to share documents.

GML (which, not coincidentally, comprises the initials of its three inventors) was based on the generic coding ideas of Rice and Tunnicliffe. Instead of a simple tagging scheme, however, GML introduced the concept of a formally-defined document type with an explicit nested element structure.

Major portions of GML were implemented in mainframe 'industrial strength' publishing systems by IBM and others and achieved substantial industry acceptance. IBM itself, reckoned to be the world's second largest publisher, adopted GML and now produces over 90% of its documents with it.

After the completion of GML, Goldfarb continued his research on document structures, creating additional concepts, such as short references, link processes, and concurrent document types, that were not part of GML but were later to be developed as part of SGML.

3. Development of SGML as an International Standard

In 1978, the American National Standards Institute (ANSI) committee on Information Processing established the Computer Languages for the Processing of Text committee, chaired by Charles Card, then of Univac, with Norman Scharpf as a member. Goldfarb was asked to join the committee and eventually to lead a project for a text description language standard based on GML. The GCA GenCode committee supported the effort and provided a nucleus of dedicated people for the task of developing Goldfarb's basic language design for SGML into a standard.

The first working draft of the SGML standard was published in 1980. By 1983, the GCA was able to recommend the sixth working draft as an industry standard (GCA 101-1983). Major adopters included the US Internal Revenue Service (IRS) and the US Department of Defense.

In 1984, with feedback from the GCA standard in hand, three more working drafts were produced. The project, which had been authorized by the International Organization for Standardization (ISO) as well as ANSI, reorganized. It began regular international meetings as what is now called ISO/IEC JTC1/SC18/WG8, chaired by James Mason of the US Oak Ridge National Laboratory. Work also continued in the ANSI committee, now called X3V1.8, chaired by William Davis of SGML Associates, and supported by the GCA GenCode committee, chaired by Sharon Adler of IBM. Alignment between ISO and ANSI was maintained by Goldfarb continuing as technical leader, serving as project editor for both groups.

In 1985, a draft proposal for an international standard was published and the international SGML Users' Group was founded in the the UK by Joan Smith, who became its first president. Together with the GCA in North America, it played a vital role in educating the public about SGML and communicating user reactions and comments back to the development project.

A draft international standard was published in October 1985, and was adopted by the Office of Official Publications of the European Community. Another year of review and comment resulted in the final text, which -- using an SGML system developed by Anders Berglund, then of the European Particle Physics Laboratory (CERN) -- was published in record time after approval (ISO 8879:1986).

4. Important early applications of SGML

SGML applications are frequently developed for use by a single organization or a small community of users. Two early applications were developed with much broader participation: the Electronic Manuscript Project of the Association of American Publishers (AAP), and the documentation component of the Computer-aided Acquisition and Logistic Support (CALS) initiative of the US Department of Defense.

a) Electronic Manuscript Project

From 1983 to 1987, an AAP committee, chaired by Nicholas Alter of University Microfilms, developed an initial SGML application for book, journal, and article creation. The application is intended for manuscript interchange between authors and their publishers, among other uses, and includes optional element definitions for complex tables and scientific formulas.

The technical work was led by Joan Knoerdel of Aspen Systems, with participation by over thirty information processing organizations, including the IEEE, Council on Library Resources, American Society of Indexers, US Library of Congress, American Chemical Society, American Institute of Physics, Council of Biology Editors, and American Mathematical Society.

The AAP industry application standard has achieved significant acceptance, and has particularly been embraced by the emerging CD-ROM publishing industry. It has been adopted as a formal ANSI application standard (Z39.59) and a corresponding ISO standard is under development.

b) Computer-aided Acquisition and Logistic Support (CALS)

The SGML portion of CALS was initiated in February 1987 when Bruce Lepisto of the Department of Defense organized a committee to address the subject. The committee consisted of John Bean of Northrup, Pam Gennusa of Datalogics, Ed Herl of the US Army, and Mary McCarthy and Dave Plimier of the US Navy. They were subsequently joined by hundreds of representatives of military contractors and military commands, who participated in additional development and review. Their efforts led to the publication of a military standard (MIL-M-28001) in February 1988.

Similar SGML projects are under way in the defense departments of Canada, Sweden, and Australia, and are under consideration by other countries.