Implementation report for RFC4234. =================================== TOOLS TESTED 1. Bill Fenner's ABNF Parser: http://rtg.ietf.org/~fenner/abnf.cgi 2. Harald Alvestrand's ABNF Parser: http://www.ops.ietf.org/abnf/ 3. Jutta Degener's ABNFGen: generates text that should be valid for a given ABNF rule -- http://www.quut.com/abnfgen/ ABNF RULES FROM RFCs TESTED See test file at bottom to see all rules tested. These were: 1. A sampling from RFC 3501 to test as many ABNF features as could be found there 2. One rule and its dependencies from RFC4466 to test incremental alternative rules 3. One rule from RFC4467 to test N* rule 4. Two rules from RFC2967 to test percent-encoded literals NOTES Note on prose rules generally: Some RFC's contain prose rules; e.g. QUOTED-CHAR in RFC3501. All parsers accept prose rules, but Jutta's works even better with only fully-specified rules. Thus, the prose rule in cases tested here were rewritten as full ABNF. For example: ;; Original prose rule QUOTED-CHAR = / "\" quoted-specials ;;Revised machine-usable rule QUOTED-CHAR = %x01-09 / %x0B / %x0C / %x0E-21 / %x22-5B / %x5D-7F / "\" quoted-specials Notes on test file (at bottom) : * Prose rules were replaced with revised rules as in the example above. * Some literal hex character representations were upper-cased to work with Harald's tool. * The definition dependencies from RFC 4234 itself were included. * To test with Jutta's tool, move the rule to be tested to the top line of the file, and her tool will output legal examples of the rule. FEATURES TESTED [numbers refer to sections of RFC4234 defining those features] 2.1 Rule naming Names can begin with alphabetic and can contain alphabetic, digit and hyphen characters: tested OK * Many success cases illustrated in all RFCs using ABNF as well as test file. * Failure case tested: beginning with digit or hyphen: successfully rejected in all tools. * Failure case tested: containing !@#$ characters Rule names are case insensitive: tested OK. Harald's parser does not handle case-insensitive rule definitions. Jutta's tool is perfectly case-insensitive. Bill's treats this as a warning but does realize that "alpha" and "ALPHA" are the same rule. 2.2: Rule form Rules end with CRLF or continue with INDENTATION: tested OK. * Bill's parser explicitly flags inconsistent indentation. * RFC3501 exhibits consistent indentation, which works with both parsers. * Harald's tool flags inconsistent line endings (not full CRLF). Rules contain elements: tested OK. * All tools report errors if a rule has no content before CRLF. * All RFC3501 rules have content elements. 2.3: Terminal Values Hexadecimal terminal values tested OK * Example: "CR = %x0D" --> all of IMAP rests on this rule in some sense :) Decimal terminal values tested OK * Example: RFC 2967 uses these instead of hex terminal values * Bill's parser canonicalizes to hex, which is a successful. * Jutta's parser successfully generates characters based on decimal terminal values. Period separator: Two parsers tested OK * Example: "CRLF = %d13.10" * Result: Works in Bill's and Jutta's parser, bug in Harald's. Quoted literals: Tested OK, very many demonstrations. 2.4: External Encodings This feature could be considered orthogonal to the basic feature list of ABNF. If a set of ABNF rules is declared to be in some encoding of some set of characters, then that's what it's in. This particular set of tests demonstrated that ASCII encoding is interoperable (interestingly, even though Jutta's tool supports UTF-8 by default, this is backwards compatible with rulesets that only do ASCII). One can also find examples, e.g. in RFC3920 of use of ABNF declared to use UTF-8. 3.1 Concatenation: Rule1 Rule2 Tested OK. Example: see RFC3501 definition of "address" rule concatenates several other rules and literals. address = "(" addr-name SP addr-adl SP addr-mailbox SP addr-host ")" 3.2 Alternatives: Rule1 / Rule2 Tested OK Example: See RFC3501 definition of "quoted-specials" rule, allowing choice between two quote characters (one expressed as a rule, one as a literal) quoted-specials = DQUOTE / "\" 3.3 Incremental Alternatives: Rule1 =/ Rule2 Tested OK Example: see RFC4466 definition of "mailbox-data". This works with Harald's and Bill's parsers, although Bill's parser would prefer to canonicalize it. It may not work with Juttas. It's an important tool in many RFCs that extend ABNF constructs found in previous RFCs. mailbox-data =/ Namespace-Response / esearch-response 3.4 Value Range Alternative: %c##-## Tested OK Example: RFC3501 "digit-nz" rule, handled correctly by all tools. digit-nz = %x31-39 Note that Harald's parser does not understand lower-case octets in this or in literals; this can be considered a bug in one implementation that does not invalidate the overall interoperability. 3.5 Sequence Group: (Rule1 Rule2) Tested OK Example: see RFC3501 definition of sequence-set, containg two ABNF sequence groups (and interestingly a recursive definition). sequence-set = (seq-number / seq-range) *("," sequence-set) 3.6 Variable Repetition *Rule Tested OK Example: RFC3501 definition of "astring" astring = 1*ASTRING-CHAR / string Example: RFC4467 definition of "enc-urlauth" enc-urlauth = 32*HEXDIG 3.7 Specific Repetition: nRule Tested OK Example: RFC3501 definition of "date-year" date-year = 4DIGIT Bug found in Jutta's parser which treats this as 1DIGIT. Other parsers accept well. This construct seems to work well in practice-- IMAP implementations seem to have four-digit years as intended. 3.8. Optional Sequence: [RULE] Tested OK Example: RFC3501 definition of "flag-list" flag-list = "(" [flag *(SP flag)] ")" 3.9. Comment: ; Comment Tested OK Example: Many examples from all RFCs; well handled in all tools. 3.10 Operator Precedence Tested OK Example: RFC3501 definition of "flag-list" contains concatenation, sequence group, repetition, concatenation again, optional sequence, and finally concatenation again, in that order. flag-list = "(" [flag *(SP flag)] ")" Jutta's tool most clearly demonstrates consistent use of operator precedence as it generates random strings which meet this rule according to the precedence described. Example pointed out as possibly confusing, mixing alternatives with concatenation: RFC3501 definition of mailbox-data mailbox-data = "FLAGS" SP flag-list / "LIST" SP mailbox-list / "LSUB" SP mailbox-list / "SEARCH" *(SP nz-number) / "STATUS" SP mailbox SP "(" [status-att-list] ")" / number SP "EXISTS" / number SP "RECENT" Results: Handled correctly by Jutta's parser and IMAP implementations. ----------------------------------------------------------- ---------cut here for test file ------------------------ ----------------------------------------------------------- ;;;;;;;;;; ;; Rules from RFC 4234 ;;;;;;;;;; ALPHA = %x41-5A / %x61-7A ; A-Z / a-z CR = %x0D CRLF = CR LF DIGIT = %x30-39 DQUOTE = %x22 HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F" LF = %x0A SP = " " ;;;;;;;;;; ;; Rules from RFC 3501 ;;;;;;;;;; address = "(" addr-name SP addr-adl SP addr-mailbox SP addr-host ")" addr-adl = nstring ; Holds route from [RFC-2822] route-addr if ; non-NIL addr-host = nstring addr-mailbox = nstring addr-name = nstring astring = 1*ASTRING-CHAR / string ASTRING-CHAR = ATOM-CHAR / resp-specials atom = 1*ATOM-CHAR ATOM-CHAR = %x01-7F ;; MODIFIED to work with Harald and Jutta's parsers ;; WAS ;; this rewriting isn't accurate but ;; helped make the rest of the testing work quickly. CHAR8 = %x01-FF ;; MODIFIED: FF put in upper-case date-year = 4DIGIT digit-nz = %x31-39 ; 1-9 flag = "\Answered" / "\Flagged" / "\Deleted" / "\Seen" / "\Draft" / flag-keyword / flag-extension ; Does not include "\Recent" flag-extension = "\" atom flag-keyword = atom flag-list = "(" [flag *(SP flag)] ")" literal = "{" number "}" CRLF *CHAR8 ; Number represents the number of CHAR8s mailbox = "INBOX" / astring mailbox-data = "FLAGS" SP flag-list / "LIST" SP mailbox-list / "LSUB" SP mailbox-list / "SEARCH" *(SP nz-number) / "STATUS" SP mailbox SP "(" [status-att-list] ")" / number SP "EXISTS" / number SP "RECENT" mailbox-list = "(" [mbx-list-flags] ")" SP (DQUOTE QUOTED-CHAR DQUOTE / nil) SP mailbox mbx-list-flags = *(mbx-list-oflag SP) mbx-list-sflag *(SP mbx-list-oflag) / mbx-list-oflag *(SP mbx-list-oflag) mbx-list-oflag = "\Noinferiors" / flag-extension ; Other flags; multiple possible per LIST response mbx-list-sflag = "\Noselect" / "\Marked" / "\Unmarked" ; Selectability flags; only one per LIST response nil = "NIL" nstring = string / nil number = 1*DIGIT ; Unsigned 32-bit integer ; (0 <= n < 4,294,967,296) nz-number = digit-nz *DIGIT ; Non-zero unsigned 32-bit integer ; (0 < n < 4,294,967,296) quoted = DQUOTE *QUOTED-CHAR DQUOTE resp-specials = "]" seq-number = nz-number / "*" seq-range = seq-number ":" seq-number sequence-set = (seq-number / seq-range) *("," sequence-set) status-att = "MESSAGES" / "RECENT" / "UIDNEXT" / "UIDVALIDITY" / "UNSEEN" status-att-list = status-att SP number *(SP status-att SP number) string = quoted / literal QUOTED-CHAR = %x01-09 / %x0B / %x0C / %x0E-21 / %x22-5B / %x5D-7F / "\" quoted-specials ;; MODIFIED to work with Harald and Jutta's parsers ;; WAS a prose rule: ;; / "\" quoted-specials quoted-specials = DQUOTE / "\" ;;;;;;;;;; ;; Rules from RFC4466 ;;;;;;;;;; esearch-response = "ESEARCH" [search-correlator] [SP "UID"] *(SP search-return-data) mailbox-data =/ Namespace-Response / esearch-response Namespace = nil / "(" 1*Namespace-Descr ")" Namespace-Descr = "(" string SP (DQUOTE QUOTED-CHAR DQUOTE / nil) *Namespace-Response-Extension ")" Namespace-Response = "NAMESPACE" SP Namespace SP Namespace SP Namespace Namespace-Response-Extension = SP string SP "(" string *(SP string) ")" search-correlator = SP "(" "TAG" SP tag-string ")" search-modifier-name = tagged-ext-label search-return-data = search-modifier-name SP search-return-value search-return-value = tagged-ext-val tag-string = string tagged-ext-label = tagged-label-fchar *tagged-label-char ;; Is a valid RFC 3501 "atom". tagged-label-fchar = ALPHA / "-" / "_" / "." tagged-label-char = tagged-label-fchar / DIGIT / ":" tagged-ext-comp = astring / tagged-ext-comp *(SP tagged-ext-comp) / "(" tagged-ext-comp ")" tagged-ext-simple = sequence-set / number tagged-ext-val = tagged-ext-simple / "(" [tagged-ext-comp] ")" ;;;;;;;;;; ;; Rules from RFC4467 ;;;;;;;;;; enc-urlauth = 32*HEXDIG ;;;;;;;;;; ;; Rules from RFC2967 ;;;;;;;;;; tab = %d09 nl = %d13 %d10 ; CR LF