ISO/IEC JTC1/SC18/WG8/N1035: Recommendations for a Possible Revision of ISO 8879


ISO/IEC standards are reviewed at least once every five years to determine whether they are still applicable or whether they should be withdrawn. Such reviews frequently result in the publication of a revised edition of the standard.

ISO 8879 was published October 15, 1986. It is the expectation of its developers (ISO/IEC JTC1/SC18/WG8) that a review will result in republication with editorial changes and possibly some new technical enhancements. The purpose of this document is to record in one place all such changes that have been agreed to by the developers. Accordingly, this document incorporates and replaces WG8 N931 and WG8 N1013.

This document should be read carefully and taken at face value. In particular, it cannot be stated with certainty that a revision of ISO 8879 will ever be published, or that, if one is published, that any of these accepted items will find their way unmodified into the final draft.

Items are listed in order by clause number. General comments precede those relating to specific clauses. Each item is preceded by a two-letter code indicating the status of the item and the type of change involved. If the source of the item is a WG8 document, the document number and item number within that document, if any, are given in parentheses. (The attachments to WG8 N680 are N680A, N680B, and N680C.)

The status codes are:

Accepted as editing instructions for first draft of revision.
Accepted for further study in preparation of revision.

The types of change code are:

Editorial: correction of typographical errors, restatement of unclear text, and changes made for consistency or to facilitate maintenance of the document.
Resolution of ambiguity (conflict within text of ISO 8879)
Technical: innovation, or change to existing function

Items coded E and R reflect the developers' understanding of SGML as defined by the existing text of ISO 8879. Items coded T represent modifications to the SGML language that will not come into effect unless and until a revision of ISO 8879 is published.

General Editorial

Delete some annexes and move them to technical report on Techniques for using SGML (ISO/IEC TR 9573) under indicated topics:

Annex B: Tutorial on basic SGML concepts

Annex C: Tutorial on additional SGML concepts

Annex D.3: Variant concrete syntaxes, including multicode concrete syntaxes

Annex D (except D.3): Public entity sets

Annex E.1: Example of document type definition

Annex E.2: Computer graphics metafile

Annex E.3: Device-independent techniques for code extension

AE (N680B 81)
Change public identifiers on any revised public text.
All references to ISO standards should change - before year to :. For example, ISO 8879-1986 should be ISO 8879:1986.
Avoid instance of an element. It should be element, or instance of an element type when emphasis on the type is desired.
AE (N680B 38a)
13.4.1 onward, keywords in where lists are in medium font while in earlier lists they are in bold.
AE (N924)
Examples of multiple-byte codes in ISO 8879 (none at present) or in technical reports should be modified to follow the recommendation in WG8 N924.
Clauses should be further subdivided and renumbered to isolate individual requirements as much as possible, in order to facilitate correlation of test cases with the standard.
AE (N680B 3)
Clarify that SHORTREF is semantically a named feature, but syntactically is not.
AE (N680B 2)
Rationalize use of italicized phrases in body of standard and annexes.

General Technical

Create an ASN.1 description of SGML for binary encoding (SGML-B) as a normative annex. SGML-B should not require delimiter recognition and should not employ markup minimization. However, it should be capable of preserving information about markup minimization, comments, etc., so that transformations in either direction between SGML and SGML-B can be made without loss of information.

Clause 3 (AE)

The footnotes identifying certain references as being at present at the stage of draft should be deleted, as those standards are now available in final form.

Clause 4: Definitions

The full text of the revised definitions is given, rather than change instructions. Although this approach adds to the size of this report, it makes it easier to see the effect of the changes. All definitions are coded AE. (N680B 20 23-24 26-28, N680C 1 7 11)

attribute (specification) list

Markup that is a set of one or more attribute specifications.

Attribute specification lists occur in start-tags, entity declarations, and link sets.

bit combination

An ordered collection of bits, interpretable as a binary number.

A bit combination should not be confused with a byte, which is a name given to a particular size of bit string, typically seven or eight bits. A single bit combination could contain several bytes.

character number

A number that represents the base-10 integer equivalent of the coded representation of a character.

character repertoire

A set of characters that are used together. Meanings are defined for each character, and can also be defined for control sequences of multiple characters.

When characters occur in a control sequence, the meaning of the sequence supercedes the meanings of the individual characters.

character set

A mapping of a character repertoire onto a code set such that each character in the repertoire is represented by a bit combination in the code set.

code extension

Techniques for including in documents the coded representations of characters that are not in the document character set.

When multiple national languages occur in a document, graphic repertoire code extension may be useful.

code set

A set of bit combinations of equal size, ordered by their numeric values, which must be consecutive.

  • For example, a code set whose bit combinations have 8 bits (an 8-bit code) could consist of as many as 256 bit combinations, ranging in value from 00000000 through 11111111 (0 through 255 in the decimal number base), or it could consist of any contiguous subset of those bit combinations.

  • A compressed form of a bit combination, in which redundant bits are omitted without ambiguity, is considered to be the same size as the uncompressed form. Such compression is possible when a character set does not use all available bit combinations, as is common when the bit combinations contain several bytes.
  • code set position

    The location of a bit combination in a code set; it corresponds to the numeric value of the bit combination.

    coded representation

    The representation of a character as a single bit combination in a code set.

    A coded representation is always a single bit combination, even though the bit combination may be several 8-bit bytes in size.

    conforming SGML document

    An SGML document that complies with all provisions of this International Standard.

    The provisions allow for choices in the use of optional features and variant concrete syntaxes.

    contextually required element

    An element that is not a contextually optional element and

    1. whose generic^identifier is the document^type^name; or

    2. whose currently applicable content^token is a contextually required token.

    An element could be neither contextually required nor contextually optional; for example, an element whose currently applicable content^token is in an or group that has no inherently optional tokens.

    current rank

    The rank^suffix that, when appended to a rank stem in a tag, will derive the element's generic identifier. For a start-tag it is the rank^suffix of the most recent element with the identical rank^stem, or a rank^stem in the same ranked^group. For an end-tag it is the rank^suffix of the most recent open element with the identical rank^stem.

    data entity

    An entity that was declared to be data and therefore is not parsed when referenced.

  • There are three kinds: character data entity, specific character data entity, and non-SGML data entity.
  • The interpretation of a data entity may be governed by a data content notation, which may be defined by another International Standard.
  • data tag group

    A content^token that associates a data tag pattern with a target element type.

    Within an instance of a target element, the data content and that of any subelements is scanned for a string that conforms to the pattern (a data tag).

    descriptive markup

    Markup that describes the structure and other attributes of a document in a non-system-specific manner, independently of any processing that may be performed on it. In particular, SGML descriptive markup uses tags to express the element structure.

    (document) type declaration

    A markup declaration that formally specifies a portion of a document type definition.

    A document type declaration does not specify all of a document type definition because part of the definition, such as the semantics of elements and attributes, cannot be expressed in SGML. In addition, the application designer might choose not to use SGML in every possible instance -- for example, by using a data content notation to delineate the structure of an element in preference to defining subelements.

    document (type) definition

    Rules, determined by an application, that apply SGML to the markup of documents of a particular type.

    Part of a document type definition can be specified by an SGML document type declaration. Other parts, such as the semantics of elements and attributes, or any application conventions, cannot be expressed formally in SGML. Comments can be used, however, to express them informally.

    document type specification

    A portion of a tag that identifies the document instances within which the tag will be processed.

    A name^group performs the same function in an entity reference.

    element set

    A set of element, attribute definition list, and notation declarations that are used together.

    An element set can be public text.

    empty link set

    A link set that contains no link rules.


    A collection of characters that can be referenced as a unit.

  • Objects such as book chapters written by different authors, pi characters, or photographs, are often best managed by maintaining them as individual entities.
  • The actual storage of entities is system-specific, and could take the form of files, members of a partitioned data set, components of a data structure, or entries in a symbol table.
  • explicit link (process definition)

    A link process definition in which the result element types and their attributes and link attribute values can be specified for multiple source element types.

    external entity

    An entity whose replacement text is not incorporated in an entity declaration; its system identifier and/or public identifier is specified instead.

    external identifier

    A parameter that identifies an external entity or data content notation.

  • There are two kinds: system identifier and public identifier.

  • A document type or link type declaration can include the identifier of an external entity containing all or part of the declaration subset; the external identifier serves simultaneously as a declaration of that entity and as a reference to it.

    formal public identifier error

    An error in the construction or use of a formal public identifier, other than an error that would prevent it being a valid minimum literal.

    A formal public identifier error can occur only if FORMAL YES is specified on the SGML declaration. A failure of a public identifier to be a minimum literal, however, is always an error.

    general delimiter (role)

    A delimiter role other than short reference.

    graphic character

    A character that is not a control character.

    For example, a letter, digit, or punctuation. It normally has a visual representation that is displayed when a document is presented.


    The portion of a parameter that is bounded by a balanced pair of grpo and grpc delimiters or dtgo and dtgc delimiters.

    There are five kinds: name group, name token group, model group, data tag group, and data tag template group. A name, name token, or data tag template group cannot contain a nested group, but a model group can contain a nested model group or data tag group, and a data tag group can contain a nested data tag template group.

    implicit link (process definition)

    A link process definition in which the result element types and their attributes are all implied by the application, but link attribute values can be specified for multiple source element types.

    internal entity

    An entity whose replacement text is incorporated in an entity declaration.


    A parameter that is a reserved name.

    In parameters where either a keyword or a name defined by an application could be specified, the keyword is always preceded by the reserved name indicator. An application is therefore able to define names without regard to whether those names are also used by the concrete syntax.

    link rule

    A member of a link set; that is, for an implicit link, a source^element^specification, and for an explicit link, an explicit^link^rule.

    link set

    A named set of rules, declared in a link^set^declaration, by which elements of the source document type are linked to elements of the result document type.

    link type declaration subset

    The entity sets, link attribute sets, and link set declarations, that occur within the declaration subset of a link type declaration.

    The external entity referenced from the link type declaration is considered part of the declaration subset.

    lower-case name characters

    Character class consisting of each lower-case name character assigned by the concrete syntax.

    lower-case name start characters

    Character class consisting of each lower-case name start character assigned by the concrete syntax.

    (markup) declaration

    Markup that controls how other markup of a document is to be interpreted.

    There are 13 kinds: SGML, entity, element, attribute definition list, notation, document type, link type, link set, link set use, marked section, short reference mapping, short reference use, and comment.

    named entity reference

    An entity reference consisting of a delimited name of a general entity or parameter entity (possibly qualified by a name group) that was declared by an entity declaration.

    A general entity reference can have an undeclared name if a default entity was declared.

    non-SGML data entity

    A data entity in which a non-SGML character could occur.


    The portion of a markup declaration that is bounded by ps separators (whether required or optional). A parameter can contain other parameters.

    proper subelement

    A subelement that is permitted by its containing element's model.

    An included subelement is not a proper subelement.

    rank stem

    A name from which a generic identifier can be derived by appending a rank^suffix.

    reportable markup error

    A failure of a document to conform to this International Standard when it is parsed with respect to the active document and link types, other than a semantic error (such as a generic identifier that does not accurately connote the element type) or:

    1. an ambiguous content model;

    2. an exclusion that could change a token's required or optional status in a model;

    3. exceeding a capacity limit;

    4. an error in the SGML declaration;

    5. an otherwise allowable omission of a tag that creates an ambiguity;

    6. the occurrence of a non-SGML character; or

    7. a formal public identifier error.


    A character string that separates markup components from one another.

  • There are four kinds s, ds, ps, and ts.

  • A separator cannot occur in data.
  • separator characters

    A character class composed of function characters other than RE, RS, and SPACE, that are allowed in separators and that will be replaced by SPACE in those contexts in which RE is B replaced by SPACE.

    SGML parser

    A program (or portion of a program or a combination of programs) that recognizes markup in SGML documents.

    If an analogy were to be drawn to programming language processors, an SGML parser would be said to perform the functions of both a lexical analyzer and a parser with respect to SGML documents.

    short reference (delimiter)

    Short reference string.

    simple link (process definition)

    A link process definition in which the result element types and their attributes are all implied by the application, and link attribute values can be specified only for the source document element.

    specific character data entity

    An entity whose text is treated as system data when referenced. The text is dependent on a specific system, device, or application process.

    A specific character data entity would normally be redefined for different applications, systems, or output devices.

    system declaration

    A declaration, included in the documentation for a conforming SGML system, that specifies the features, capacity set, concrete syntaxes, and character set that the system supports, and any validation services that it can perform.

    target element

    An element whose generic^identifier is specified in a data^tag^group.


    The portion of a group, including a complete nested group (but not a connector), that is, or could be, bounded by ts separators.

  • Clause 7

    AE 7.1 (N680A 5, N701 2)

    Clarify governing principle that the parsing of a document instance shall not be affected by the concurrent parsing of other document instances. For example, the replacement text of an entity reference could differ from one active concurrent instance to another. Also, a record end could be ignored in one instance and not in another.

    AE 7.1

    Caution the user that short references in the base document instance are treated as data in other concurrent instances.

    AE 7.6.1, first note


    For example, in

    record 1&stago;outer>&stago;sub> record 2&etago;sub> &etago;outer>record 3 with

    For example, in the following three records:

    record 1 data&stago;outer>&stago;sub> record 2 data&etago;sub> record 3 data&etago;outer>
    AE 7.9.3
    Clarify that the order of the tokens is significant and cannot be changed by a parser.

    Clause 9

    AE 9.2.1
    Add note clarifying that character classes in productions 52 and 53 are defined in Figures 1 and 2.

    Clause 10

    AR 10.1.6
    Clarify that system must determine storage location of entity or notation from the name and external identifier; it does not generate a modified system identifier.
    AE 10.1.7
    Add note clarifying that charcter classes in production 78 are defined in Figure 1.

    Clause 13

    AE 13
    In Note 1, change document markup features to: markup minimization features
    AE 13
    In Note 1, change last parenthesized phrase to: (for example, if the document quantity set required larger values than were available in the system quantity set)
    FE 13 (N759 12, N790 2)
    Clarify relationship between document character set and syntax-reference character set. In particular, that concrete syntax is defined in terms of characters, not bit combinations. (Contributions invited: a short explanation for this clause; examples and discussion for a technical report.)
    AE 13.1

    In first sentence, change a coded to: one and only one coded

    AE 13.1
    In first sentence, change as to: that is,

    Replace last paragraph with:

    The public^identifier should be a formal^public^identifier with a public^text^class of CHARSET.

    AE 13.1.2
    In first paragraph, change added to: assigned
    AR 13.2

    Replace last paragraph with:

    The public identifier should be a formal public identifier with a public text class of CAPACITY.

    AE 13.4.1

    Replace last paragraph with:

    The public^identifier should be a formal^public^identifier with a public^text^class of SYNTAX.

    AE 13.4.3

    Change a coded to: one and only one coded

    AE 13.4.3
    Change as to: that is
    AE 13.4.3
    Change of to: for
    AR 13.4.3

    Clarify that a parameter literal in the SGML declaration is interpreted as though its character set were the syntax-reference character set. Therefore, a character can be entered directly in a parameter literal only if it has the same character number in the document character set as in the syntax-reference character set. If not, it must be entered as a character reference.

    AR 13.4.5 (N927 1)

    Resolve conflict between intent of text and syntax production rule, which restricts the declared concrete syntax, by treating production 189 as though each occurrence of ps+, parameter^literal were replaced by (ps+, parameter^literal)+, and by replacing each occurrence of the word literal in the text with literals.

    AE 13.4.5
    Change all occurrences of added to: assigned
    AE 13.4.5
    Clarify that a character can be assigned only once as a lower-case name or name-start character (that is, assigned once only to either LCNMCHAR or LCNMSTRT, but not both).
    AR 13.4.5 (N759 7, N790 5)
    Clarify that different lower-case characters can be associated with the same upper-case form, which can be a UC Letter. The associated upper-case forms can be the same as the lower-case, for languages (or special characters) where the concept of capitalization does not apply.
    FT 13.4.5 (N759 10, N790 1 4)
    Allow the set of Digit characters to be extended by a concrete syntax (NUCHAR for numeral characters?). A character could not be assigned to more than one of NUCHAR, LCNMSTRT, and LCNMCHAR.
    FT 13.4.5 (N927 1)
    Devise a less burdensome method of declaring long sequences of character numbers.
    AR 13.4.6 (N759 2)

    Add new paragraph:

    The length of a delimiter string in the delimiter set cannot exceed the NAMELEN quantity of the quantity set.

    AR 13.4.7 (N759 1)

    In production 193, replace second name with parameter^literal and replace first paragraph with:

    The name is a reference reserved name that is replaced in the declared concrete syntax by the interpreted parameter^literal, which must be a valid name in the declared concrete syntax.

    AE 13.4.7

    Add new note before the existing first note:

    The list of reference reserved names that can be replaced in a declared concrete syntax is:

    AR 13.4.8 (N759 3)

    In last sentence of first paragraph, change the period to: , which must exceed the reference value. The resulting quantity set must be rational.

    For example, TAGLEN must be greater than LITLEN because literals occur in start-tags. Similarly, LITLEN must exceed NAMELEN because names occur in literals.

    Clause 15

    FE 15 (N680B 38b-41)
    Make editorial changes.
    AE 15.6, third paragraph

    In second sentence, change a coded to: one and only one coded

    AR 15.6, third paragraph
    In the second sentence, change as to: that is,
    AE, production 202

    Before SRLEN, insert: ps+,

    >Figures in Body

    AE Fig. 6

    For ATTSPLEN, replace description with: Normalized length of an attribute specification list.

    Annex A

    FE A (N680B 42)
    Make editorial change.
    FE A
    Add some key examples from current annexes B and C.

    Annex C (for TR 9573)

    AE C.1.1.2


    <!ELEMENT -- CONTENT EXCEPTIONS? -- 1 list (item+) 2 item (p | list)* with <!-- ELEMENTS CONTENT --> <!ELEMENT list (item+) <!ELEMENT item (p | list)*
    AE C.1.4, first example

    In line 1, jobitem should be delimited by LIT.

    AE C.1.4, first example

    Replace last two lines:

    Kit von Suck, Member </joblist> with Kit von Suck, Member</joblist>
    AE C.1.4, p.105

    Delete last example and its discussion, which begins at the last paragraph of p.105 and continues through I wonder whether Mrs. G will read this on p.106.

    AE C.3.1

    Replace example with:

    <(source)item> <(layout)block indent=5>Text of list item. </(source)item> </(layout)block>

    Annex F

    AR F
    Add explicit statement of information exchanged between SGML parser and application, based on Attachment 1 (Element Structure Information Set).
    FE F.1 (N680B 74-75)
    Make editorial changes.

    Annex G

    AE G
    Delete and move to new standard on Conformance Testing if project for it is approved.
    FE G (N680B 76-79)
    Make editorial changes.

    Attachment 1: The ISO 8879 Element Structure Information Set (ESIS)

    There are two kinds of SGML application (and therefore two kinds of conforming SGML application):

    1. A structure-controlled SGML application operates only on the element structure that is described by SGML markup, never on the markup itself.

    2. A markup-sensitive SGML application can act on the actual SGML markup and can act on element structure information as well. Examples include SGML-sensitive editors and markup validators.

    The set of information that is acted upon by implementations of structure-controlled applications is called the element structure information set (ESIS). ESIS is implicit in ISO 8879, but is not defined there explicitly. The purpose of this paper is to provide that explicit definition.

    ESIS is particularly significant for SGML conformance testing because two SGML documents are equivalent documents if, when they are parsed with respect to identical DTDs and LPDs, their ESIS is identical. All structure-controlled applications must therefore produce identical results for all equivalent SGML documents. In contrast, not all markup-sensitive applications will produce identical results from equivalent documents. (For example, a program that prints comment declarations, or that counts the number of omitted end-tags.)

    ESIS information is exchanged between an SGML parser and the rest of an SGML system that implements a structure-controlled application. Although an implementation may choose to wire in some of ESIS, such as the names of attributes, a structure-controlled application need have no other knowledge of the prolog than what ESIS provides.

    A system implementing a structure-controlled application is required to act only on ESIS information and on the APPINFO parameter of the SGML declaration.

  • This requirement does not prohibit a parser from providing the same interface to both structure-controlled and markup-sensitive applications, which could include non-ESIS information (e.g., the date), and/or information that could be derived from ESIS information (e.g., the list of open elements).

  • The documentation of a conforming SGML system that supports user-developed structure-controlled applications should make application developers aware of this requirement. Such a system should facilitate conformance to this requirement by distinguishing ESIS information from non-ESIS in its interface to applications. Note 1 in of ISO 8879 applies only to structure-controlled applications.

  • In the following description of ESIS, information is identified as being available at a particular point in the parsed document. This identification should not be interpreted as a requirement that the information actually be exchanged at that point — all or part of it could have been exchanged at some other point. Similarly, there is no constraint on the manner (e.g., number of function calls) or format in which the exchanges take place.

    The ESIS description includes the information associated with all of the SGML optional features. When a given feature is not in use, corresponding information is not present in the document. ESIS information is transmitted from the parser to the application unless otherwise indicated.

    ESIS information applies to a single parsed document instance. Therefore, if concurrent instances are being parsed, the applicable document type name must be identified. This requirement also applies when parsing intermediate instances in a chain of active links.

    ESIS information consists of the identification of the following occurrences, and the passing of the indicated information for each:

    1. Initialization

      The application must inform the SGML parser of the active document types, the active link types, or that parsing is to occur only with respect to the base document type.

    2. Start of document instance set

      For each active LPD, the link type name and link set information (see below) for the initial link set.

    3. Start of document element only

      For each active simple link, the link type name and attribute information (see below) for the link attributes.

    4. Start of any element

    5. End of any element, including elements declared to be empty

      Generic identifier

      If the element was empty, ESIS does not indicate why it was empty; that is, whether it was declared to be empty, or whether an explicit content reference occurred, or whether it just happened to contain no data characters.

    6. End of document instance set

      Processing instructions could occur between the end of the document element and the end of the document instance set.

    7. Processing instruction

      System data

    8. Data

      Includes no ignored characters (e.g., record starts).

      Includes only significant record ends, with no indication of how significance was determined. Characters entered via character references are not distinguished in any way. Implementation-specific means can be used to represent bit combinations that the application cannot accept directly.

    9. Such bit combinations may be those of non-SGML characters entered via character references, but no significance is attached to this coincidence.

    10. Bit combinations of non-SGML characters that occurred directly in the source text would have been flagged as errors, and would therefore never be treated as data.

    11. Attribute information

      All attribute values must be reported and associated with their attribute names.

    12. For example, a parser could supply the attribute names with each value, or supply the values in an order that corresponds to a previously-supplied list of names.

    13. The order of the tokens in a tokenized attribute value shall be preserved as originally specified.

      Each unspecified impliable attribute must be identified.

      For example, a parser could identify such attributes explicitly, or it could allow the application to determine them by comparing the identified specified attribute values to a previously-supplied list of attribute names.

      There shall be no indication of whether an attribute value was the default value.

      The order in which attributes are specified in the attribute specification list is not part of the ESIS.

      General entity name attribute values include the entity name and entity text. The entities themselves are not treated as having been referenced.

      An application can use system services to parse the entities, but such parsing is outside the context of the current document.

      For notation attributes, the attribute value includes the notation name and notation identifier.

      For CDATA attributes, references to SDATA entities in attribute value literals are resolved. The replacement text is distinguished from the surrounding text and identified as an individual SDATA entity.

      For CDATA attributes, references to CDATA entities in attribute value literals are resolved. The replacement text is not distinguished from the surrounding text.

    14. References to internal entities

      The information passed to the application depends on the entity type:


      replacement text, identified as an individual SDATA entity.


      replacement text, identified as a processing instruction but not as an entity.

      For other references, nothing is passed to the application.

      The replacement text is parsed in the context in which the reference occurred, which can result in other ESIS information being passed.

    15. References to external entities

      The information passed to the application depends on the entity type:

    16. Link set information

      All link rules whose source element specification is implied.

    Thanks to Rick Jelliffe for converting this document to HTML.

    [Link to Current Text of ISO 8879 (SGML)].