Future plans for HTML

HTML directions

The HTML language has been in use in the field since 1990, and several suggestions have been made for improvements. See working notes . A new DTD will be the result.

Bad HTML

Much of the HTML actually around has been generated by the NeXTStep editor, which has in fact generated bad HTML. This should not confuse the specification. Some bugs in that output include non-matching open and close tags, and a NEXTID tag which is not SGML. Also, attribute values are not quoted even when they contain characters which require them to be quoted in SGML.

A perl script was written by Dan Connolly to clean up bad HTML.

Also, see Dan's HTML spec (draft) which contains a sort of test suite.

New features

Please mail me mentioning this list if you think of features I have missed out.

Header

A wrapper element for all the document-wide information such as title, document-wide links, etc. Advantage: You know when you have got to the end of it, and can open a window with the required attributes. This is easier than checking for a printable character.

Disadvantage: If mandatory, the size of the minimum document is increased.

A "Body" tag might be useful in the same light, for the rest.

Link

A document-wide link, as distinct from a localized anchor. Mainly useful in conjunction with interesting link types such as related-index, related-glossary, parent, author, print-with, copy-with, etc.

An empty element.

Atributes are as for the anchor element.

Dates

A tag giving the dates a document was created, modified and expired is going to be essential for caching systems.

The expiry date-time will allow long cache times for documents such as RFCs, and short or zero caching times for varing data.

<DATE CREATED="920630123067" EXPIRES="920706000000">

(Is there an SGML standard for datetimes? Which standard to use standard? HyTime?)

Highlighting

The HPx elements are not implemented. Some bold/italic/fixed width highlighting is useful, with equivalent representations on single font devices. Three possibilities are
Numbered HPn tags
These are rather meaningless. In practice, everyone has to remember which is bold and which is italic.
Logical tags.
Dan: "I'd prefer <em>, <tt>, <cite>, ala TeX. Or we could go with the O'Reilly/Hal DocBook tags: <Emphasis>, <OopsChar>, <wordasword>,<CiteBook>,<Subscript>, <Superscript>". A problem is there are never enough of them, so people reuse them on the understanding that they will be bold, etc.
Physical tags:
<Bold>, <italic> etc as in MIME. There would have to be an understanding that equivalent representations could be substituted where bold and italic are not available.

Base address

savedas
Could be a name for the tag to give the address with which the document was saved, so that relative links could be resolved even when a document is found out of context (like mailed).

Fixed width text with anchors etc

The XMP and LISTING elements have proved essential for putting on line text already formatted assuming a fixed-width character set. Many people have asked for a version which, instead of being oblivious to any embedded elements, added elements, ang and anchors withing the text. Line end would have to be mareked as such (with P) so that marked-up a line could be represented on many lines: the markup could make it too long to send as it was, and very inconvenient.

Note that an editor could always save in this element something which was originally loaded as a raw text section: indeed, the raw text is really only a (very useful!) way of importing text which could also go though a filter to make it valid marked up SGML.

Fixed width indented

Very often one wants to quote a command in fixed width font, but indented as a quotation, say 40 characters wide rather than 80. Perhaps the width required should be a parameter to the fixed width with anchors element. (Smacks of low-level format!)

Ordered list

Perhaps the OL tag ought to go back in, to distinguish the ordered list from the unordered one. Dan Conolly implements it.

Link types

There is a list of link types . We should formalize these, and then people actually could implement them. This corresponds to giving values to the TYPE attribute . This attribute cohis attribute coEL for RELATIONSHIP to avoid confusion between the type of link and the type of object to which it points.

Entities

A full set of entities for specical charecters should be defined, picked out of a suitable standard table. This should allow for accented characeters and bullets as a minimum. Representation using regular USASCII stand-ins (such as oe for o umlaut) should be allowed where the full character sets are not available. Editors must preserve entities even when the display has defaulted to a stand-in character combination.

Comments

The ability to hide information in an SGML document is useful. The COMMENT entity was introduced for this purpose in the line mode browser as an experiment. It should go in as standard in future. If it can contain anything then it can be used for commenting things out.
Tim BL