Notation

A. BNF notation for syntax

This section has three parts:

(a) a straight copy of a section of RFC #822 Standard for ARPA Internet Text Messages, August 13, 1982,

(b) changes and additions to (a),

(c) a set of rules that we use everywhere and that are listed here once.

(a) NOTATIONAL CONVENTIONS

This specification uses an augmented Backus-Naur Form (BNF) notation. The differences from standard BNF involve naming rules and indicating repetition and "local" alternatives.

1. RULE NAMING

Angle brackets "<" and ">" are not used, in general. The name of a rule is simply the name itself, rather than <name> Quotation-marks enclose literal text (which may be upper and/or lower case). Certain basic rules are in uppercase, such as SPACE, TAB, CRLF, DIGIT, ALPHA, etc. Angle brackets are used in rule definitions, and in the rest of this document, whenever their presence will facilitate discerning the use of rule names.(Note for WWW: we never use them)

2. RULE1 / RULE2: ALTERNATIVES

Elements separated by slash ("/") are alternatives. Therefore "foo / bar" will accept foo or bar. NOTE: this rule is changed to use the vertical bar character "|" instead of slash, since the syntax for directory paths uses slashes heavily.

3. (RULE1 RULE2): LOCAL ALTERNATIVES

Elements enclosed in parentheses are treated as a single element. Thus, "(elem (foo | bar) elem)" allows the token sequences "elem foo elem" and "elem bar elem".

4. *RULE: REPETITION

The character "*" preceding an element indicates repetition. The full form is: <l>*<m>element indicating at least l and at most m occurrences of element. Default values are 0 and infinity so that "*(element)" allows any number, including zero; "1*element" requires at least one; and "1*2element" allows one or two.

5. [RULE]: OPTIONAL

Square brackets enclose optional elements; "[foo bar]" is equivalent to "*1(foo bar)".

6. NRULE: SPECIFIC REPETITION

"<n>(element)" is equivalent to "<n>*<n>(element)" that is, exactly n occurrences of (element). Thus 2DIGIT is a 2-digit number, and 3ALPHA is a string of three alphabetic characters.

7. #RULE: LISTS

A construct "#" is defined, similar to "*", as follows: <l>#<m>element indicating at least l and at most m elements, each separated by one or more commas (","). This makes the usual form of lists very easy; a rule such as '(element *("," element))' can be shown as "1#element". Wherever this construct is used, null elements are allowed, but do not contribute to the count of elements present. That is, "(element),,(element)" is permitted, but counts as only two elements. Therefore, where at least one element is required, at least one non-null element must be present. Default values are 0 and infinity so that "#(element)" allows any number, including zero; "1#element" requires at least one; and "1#2element" allows one or two.

8. ; COMMENTS

A semi-colon, set off some distance to the right of rule text, starts a comment that continues to the end of line. This is a simple way of including useful notes in parallel with the specifications.

(b) Changes and additions

1) nonterminals are written starting with a capital and having capitals at embedded word starts, as in: RuleOfTerminalsAndNonTerminals 2) a rule is written as NonTerminal ::= RuleOfTerminalsAndNonTerminals 3) because the slash is heavily used in directory path names, the alternatives are separated by a vertical bar "|" rather than a slash.

4) sometimes it is necessary to specify a layout character sequence, such as newline. We have here adopted the conventions used in C strings:

\n newline \r carriage-return \t tab \b backspace \f form feed 5) because the angle brackets are used by SGML, we will never use angle brackets to indicate nonterminals, but always use the way of writing nonterminals given above.

(c) General rules

HTTP data are written in line format, ie. line breaks are significant.

Line breaks

CrLf ::= \r\n

Strings:

The purpose of a string is to allow any sequence of printable characters and the space to be transmitted. A string is a sequence of ASCII characters, written using C's notation for strings. Thus, a string is surrounded by double-quote characters and excludes control characters.

If a double-quote character is used inside a string, it must appear as the sequence \" (backslash followed by double quote).

A control character can be represented by a backslash followed by its ASCII sequence number in octal notation.

The NUL is not representable.

RC