2.1.15 MIME Encapsulation of Aggregate HTML Documents (mhtml)

NOTE: This charter is a snapshot of the 39th IETF Meeting in Munich, Bavaria, Germany. It may now be out-of-date.

Chair(s):

Einar Stefferud <stef@nma.com>

Applications Area Director(s):

Keith Moore <moore+iesg@cs.utk.edu>
Harald Alvestrand <Harald.T.Alvestrand@uninett.no>

Applications Area Advisor:

Keith Moore <moore+iesg@cs.utk.edu>

Mailing Lists:

General Discussion: mhtml@segate.sunet.se
To Subscribe: listserv@segate.sunet.se
In Body: subscribe mhtml <full name>
Archive: ftp://segate.sunet.se/lists/mhtml/

Description of Working Group:

World Wide Web documents are most often written using Hyper Text Markup Language (HTML). HTML is notable in that it contains "embedded content"; that is, HTML documents often contain pointers or links to other objects (images, external references) which are to be presented to the recipient. Currently, these compound structured Web documents are transported almost exclusively via the interactive HTTP protocol. The MHTML working group has developed three Proposed Standards (RFCs 2110, 2111 and 2112) which permit the transport of such compound structured Web documents via Internet mail in MIME multipart/related body parts.

The Proposed Standards are intended to support interoperability between separate HTTP-based systems and Internet mail systems, as well as being suitable for combined mail/HTTP browser systems.

It is beyond the scope of this working group to come up with standards for document formats other than HTML Web documents. However, the Proposed Standards so far produced by the working group have been designed to allow other such formats to use similar strategies.

The MHTML WG is currently INACTIVE while first implementations are under way. To support implementation efforts, the WG Editor maintains an Informational Internet-Draft ftp://ftp.dsv.su.se/users/jpalme/draft-ietf-mhtml-info-06.txt which provides additional useful information for implementors. This Informational Draft also discusses Web page formatting choices that affect their efficient use through disconnected channels such as mail. It will become an Informational RFC after implementation experience has been collected. Until then, this informational draft will be kept current and available in the IETF Internet-Drafts library.

The MHTML Mailing List remains open for discussion of any issues that may arise during implementation, and to collect information about successful interoperable and interworkable implementations in anticipation of progression to Draft-Standard Status.

From May to October, 1997, the working group will Monitor Implementation progress and discuss issues, periodically Update Draft of Informational Document.

The editors of this group are:

Main editor: Jacob Palme <jpalme@dsv.su.se Associate editor: Alex Hopmann <alex.hop@resnova.co

Goals and Milestones:

Mar 96

  

Clarify issues and submit first Internet-Draft.

Jun 96

  

Submit first Internet-Draft for MIME encapsulation of HTML.

Sep 96

  

Submit MHTML specification to IESG for consideration as a Proposed Standard.

Oct 96

  

Submit Internet-Draft for guidelines in created documents for disconnected access.

Dec 96

  

Submit guidelines Internet-Draft to IESG for consideration as an Informational RFC.

Aug 97

  

Meet at Munich to review Implementation progress.

Oct 97

  

Submit Implementation Progress Internet-Draft to IESG for publication as an Informational RFC.

Internet-Drafts:

Request For Comments:

RFC

Status

Title

 

RFC2112

PS

The MIME Multipart/Related Content-type

RFC2111

PS

Content-ID and Message-ID Uniform Resource Locators

RFC2110

PS

MIME E-mail Encapsulation of Aggregate Documents, such as HTML (MHTML)

Current Meeting Report

Minutes of the MHTML Working Group

Einar Stefferud submitted these minutes from excellent notes provided by Eric Berman.

I. Brief Summary

MHTML met at IETF in Munich with 28 participants and resolved all issues on its agenda.

It was decided that the MHTML specifications should be recycled at Proposed to replace the faulty MHTML PROPOSED STANDARD RFCs currently in circulation. The group set September 30 as its date for moving the new documents to IETF Last Call. The new drafts will be posted as soon as possible for WG review and open discussion on the mailing list.

All output of the meeting is subject to review and consensus evaluation on the MHTML Mailing List. <MHTML@SEGATE.SUNET.SE>

List Archives are at <ftp://segate.sunet.se/lists/mhtml/> <http://segate.sunet.se/archives/mhtml.html>

List archives are also available by e-mail. Send a message to LISTSERV@SEGATE.SUNET.SE with the text INDEX MHTML to get a list of the archive files and then send a new message GET <file name> to
retrieve selected archive files.

To subscribe to this list, send a message to LISTSERV@SEGATE.SUNET.SE which contains the text SUB MHTML <your name (not your email address)>.

II. Agenda Review

Jacob Palme provided a detailed agenda. Nobody had anything to add to it. These minutes exactly follow the agenda.

Item 1: Exact matching when no absolute base is known.(draft-ietf-mhtml-info-06.txt section 8.2).

Editors Note: At the time of writing these minutes (9 Sept 97), a long Email discussion has taken place and this agenda item is being resolved. Interested parties are referred to the MHTML ARCHIVES for the discussion and the resolution.

Issue 1.A: Exact matches in section illegal URLs -- There appear to be four potential solutions:

a) Keep illegal spaces
b) Convert all illegal spaces according to RFC 2017 (in HTML as well as header)
c) Convert only in the mail header, not in the HTML text
d) Convert mail header according to RFC 2047

Both IE and Netscape accept spaces in URLs, but content-location can't have them.

One suggestion: Do relative to absolute conversion, and then do byte-for-byte decoding. NO conversion.

Another suggestion: that (d) does not make sense.

Suggest Three Steps:

a) Relative to absolute URL resolution
b) Remove any escaped URL chars
c) Then compare

Doing (b) after (a) avoids the problem of confusing "/" with hierarchy character. "A/B" and "A%2EB" would match for purposes of matching, but for purposes of relative to absolute, the %2E would not be a hierarchy char.

Another suggestion: We should just put in the content-location what was in the URL. For example, just allow spaces in Content-Location; keep the illegal URLs. Content-location should be blind.

Consensus emerged on choice (a) -- just use the illegal URL in the CONTENT-LOCATION header. BUT illegal MIME chars (e.g., u with umlaut in a URL) must be escaped according to RFC 2047 (e.g., space does not have to be encoded, but illegal MIME chars such as u with an umlaut must be).

Some people worried about charset issues. But it was pointed out that URLs are really just octets in us-ascii, so simple encoding should be adequate. New proposal:

a) Remove MIME encoding
b) Relative to absolute
c) Octet comparison

Some questions remained. We currently say nothing about charset encodings in the Content-Location MIME header. Should we? It was decided to go to the mailing list with this discussion to seek final resolution.

Issue 1.B: Matches with content-base specified.

Does this apply only to relative Content-Locations without any Content-Base? Should we say something about exactness of matches when URLs are resolved using a Content-Base?

Solution: 8.2.2 should change from "exact textual match" to "exact octet-for-octet match."

Issue 1.C: Relative unresolvable URL in the header with an absolute URL in the body. (e.g., HTML has relative URL with BASE tag, and MIME has same relative URL but no base.

Answer in this case: spec is OK, no changes needed.

Issue 1.D: No relative to absolute resolution if no content-Base is present.

This regards the Content bases that apply to the parts and content base that applies to the HTML leaf part.

A proposal: addition to specification that says that when resolving a relative URL in content, the 1st priority is base specifier in the content (e.g., HTML BASE tag), 2nd priority is to look for base specifier in that body parts header, 3rd priority is content-location of that body part. If resolving a relative URL in the header, only look to its content base.

Consensus that this should be clarified in the spec.

Issue 1.E: Content-Base in one part, not in another in section 8.2.

Answer, they do not match in Jacob's example. No dispute.

Item 2: Validity of Content-Base and Content-Location in Section 5 of draft-ietf-mhtml-info-06.txt.

Editors Note: At the time of writing these minutes (9 Sept 97), a long Email discussion has taken place and this agenda item is being resolved. Interested parties are referred to the MHTML ARCHIVES for the discussion and the resolution.

Issue 2.A: Use of Content-Base and Content-Location for information?

Question: Should Content-Base and Content-Location be allowed in cases where they do not influence functionality as a way of informing the reader that a body part was taken from a certain location?

Consensus that this is not illegal.

Issue 2.B: Allow Content-Base/Content-Location outside of multipart/related?

Draft section 4.1 says, "These two headers may occur both inside and outside of a multipart/related part"?

First comment here is that "inside/outside" needs to be replaced with "member of" and "not member of."

Suggestion: we don't touch the problem of referencing something outside of the multipart/related.
(e.g., Multiparts are unitary things. To do something outside of the multipart/related would require a separate draft -- we won't prevent it, but we don't address it in this spec, it's experimental. For example, namespace mapping is scoped to the multipart.

Suggestion: We should caution strongly against a Content-Location mapped URL from within one multipart/related interfering with links in another multipart/related. This would apply only to Content-Location, not Content-ID. This is something to put under security considerations.

Issue 2.C: Allow Content-Base/Content-Location to be valid for object parts?

(Draft section 4.1 says these two headers are valid and are thus meaningless in multipart headings")

There are 2 issues:

a) Need to decide an unclear issue, and
b) Get URL syntax draft to remove references to Content-Location/Content-Base.

Observation: Larry Masinter needs to copy our new version in his draft or else we have to change our conclusion to match his.

Suggestion: Let's make it clear that our spec is authoritative on Content-Location/Content-Base. Content-Location on a multipart is not actually meaningless, because it can be a base, and multipart/related can be returned by http.

Proposal: Content-Base/Content-Location on a multipart, but with proviso that Content-Base only serves to resolve the Content-Location on the multipart, and the result had better be the location where you can retrieve the whole multipart. Avoids having to walk up and down the tree.

Observation: walking up and down the tree is non-problematic. Question: if you have a relative CONTENT-LOCATION and no CONTENT-BASE, should you walk up? Consensus in the room seems to be that we should not walk the tree. Or, stated another way: if it is not on a part, it is not there and you don't walk the tree.

Larry Masinter joined the meeting: His URL draft is not a WG draft, and he will defer to what the MHTML WG thinks. He wants to know what our WG decides and he wants proposed text. He will be happy for MHTML text to be normative.

All this was eventually left to be resolved on the MHTML mailing list. See the mailing list archive for resolution.

Issue 2.D: Precedence of Content-Base and Content-Location in section 5

Determined to not be an issue.

Issue 2.E: Allow same Content-Location on two body parts in section 7.

Question: Should we allow the same Content-Location on two body parts, if they resolve to different URLs? (Last paragraph of section 7).

Answer: yes.

Does Content-Base affect Content-Location adjacent to it?

Answer: Yes, we should allow it. It's a bit weird, but it enables some things and was allowed anyhow, and is not actually problematic.

Issue 2.F: Allow multiple Content-Location headers with different value in same content heading?

Question: Should this be allowed? Some say No

Proposal: do not allow multiple Content-Locations so as to avoid ambiguity about base, but a Multipart MIME part can have multiple Content-IDs.

Suggestion: Allow multiple Content-Locations if sender asserts they are all valid. We may need to point out that we are modifying MIME for the Content-Location header, that it applies elsewhere.

Consensus is just say NO. Full stop.

Issue 2.G: Can Content-Location provide a Base, if no Content-Base is specified?

Need language clarification, no major controversy.

Consensus: Switch Content-Base and Content-Location descriptions (editorial).

Item 3: Robustness Principles in general:

Should we explain how liberal interpretations should deal with incorrect stuff. Basic principle is to say, "do the spec, for crying out loud." No robustness principles in spec.

Consensus: We should document what is changed -- and what was wrong in the 2110 examples, but basically, you should just do the spec.

Item 4: Miscellaneous Technical Issues.

Issue 4.A: Need to scrub the examples.

Suggestion: Find a place to deposit examples. Jacob has the MHTML WG web site.

Consensus that we submit draft examples to ftp directory that Jacob Palme would maintain. Everyone on the list should feel encouraged to submit examples.

Issue 4.B: Hyperlinks between messages.

There is some worry that we should not explicitly disable/discourage this. Someone expressed worries about the security implications.

Resolution is to be explicit that it is not prohibited, but note that many agents will not be able to resolve
inter-message references. We should perhaps observe in our discussion of scoping that "cid:" is not really subject to those scoping rules, since cid is supposed to be globally unique. Excellent place to note that someday you might be able to use this to reference external messages.

Issue 4.C: Folding of URLs over Multiple e-mail Header Lines.

One approach is RFC 2017, and another is to use draft-freed-pvcsc-03.txt. It was pointed out that you are
hosed if you fold at an (illegal) space. No problem here for legal URLs -- they will fold just fine according to RFC2017. So we will add a warning that if you fold an illegal URL, you should make sure you unfold back to the original octets.

Issue 4.D: Value of start parameter to multipart/related, particularly if "type multipart/alternative."

Some people are very concerned about manifest -- being able to determine what is in the multipart related. One proposal is to be able to add, say, "text/html" after type. It was pointed out that this is part of a larger problem of manifests, which would actually suggest a topic for a future WG.

Consensus: Do not change anything about type in this spec.

Item 5

Issue 5.A: Charter/status of working group.

We are clearly active. We have only had a pause in our work. Harald pointed out that we are also definitely not concluded.

Goals and Milestones:

Consensus: Try for last call by September 30 for new draft to Proposed standard. Jacob thinks he can get the draft done in time for this. We will meet in December to review implementation progress.

Publishing Informational Document:

It is now time to try to propose this simultaneously.

Revisions of RFC2111 and RFC2112: Minor corrections needed

Interoperability documentation: WE need to write down list of features in protocol, write down a list of implementors, and for each, list who has done what. Need two independent implementations of emitters and two independent implementations or receivers that interoperate. These do not have to be paired up for all features and functions. The feature list must come from the MHTML WG Mailing List.

Slides

None Received

Attendees List

go to list

Previous PageNext Page