The AVT Working Group met twice during IETF 70: Monday, 3 Dec 07, from 15:20-17:20, and Tuesday, 4 Dec 07, from 18:50-19:50. The Chairs were Colin Perkins , Tom Taylor , and Roni Even . Note takers were Alan Clark , Stephen Casner , and Tom Taylor. In addition to the formal meetings, an ad hoc meeting to discuss how to progress the SVC payload specifications in draft-ietf-avt-rtp-svc-03 was held over lunch on Tuesday. A set of open issues was agreed and presented to the list (see Roni Even's note to the list, 10 December 2007). An open design team will work on these issues by conference call, as indicated in Roni's note. Monday, 3 December 2007 at 15:20-17:20 15:20 Introduction and Status Update Chairs ====================================== Colin presented the agenda and status. His charts are at http://www3.ietf.org/proceedings/07dec/slides/avt-0.pdf. No new AVT RFCs were published since the last meeting, but a number of documents passed WGLC and are in various stages of subsequent processing. Five documents are either ready for WGLC or close to it. In particular, the WG is asked to provide a pre-last-call review of draft-ietf-avt-rtp-uemclip and draft-ietf-avt-rtp-speex. Alan Clark volunteered to take notes for the first part of Monday's meeting, Stephen Casner for the second part. 15:35 RTP Payload Format for SVC Video Schierl draft-ietf-avt-rtp-svc-03 ======================================== Thomas Schierl presented. Charts are at http://www3.ietf.org/proceedings/07dec/slides/avt-7.pdf. The summary of the presentation is as follows: Slide 2: standardization status in JVT and ITU-T SG 16 Slides 3-5: changes since the -01 draft Slides 6-7: Network Abstratcion Layer (NAL) unit reordering with and without cross-layer decoding order number (CL-DON) Slides 8-10: selection of discussion topics raised on the AVT list Slide 11: authors' "to do" list Slide 12: summary of open issues The ITU-T approved the defining standard for scalable video coding (SVC) in November, 2007. The approved text is in JVT-X201, but is not yet publicly available. Its official designation will probably be: Annex G of ITU-T recommendation H.264: 2008, Advanced video coding for generic audiovisual services (4th Edition). Thomas summarized the considerable number of changes made in the -02 draft (see slides). [Note: some of these changes seem to have been made in the transition to version -03. Authors please verify.] Since the -03 version has appeared, there has been a lively discussion on the list. The next two slides after the list of changes focussed on the issue of recovery of NAL unit presentation order, using either an explicit Cross-Layer Decoding Order Number (CL-DON) or parameters supplied by RTP and SDP (see below). The controversy is about the means to allow layered multicast with packetization mode 0. The final slides summarized feedback from the list. NAL unit reordering without CL-DON: -- Use RTP timestamps to match frames; use RTP sequence number to maintain decoding order in each layer. -- Need dummy NAL units to have a timestamp assigned to match the frame in other layers. Colin: as pointed out in list discussion, you can send RTCP immediately for unicast sessions, and you can send more frequently when appropriate, so sync delay should not be a problem. Jonathan Lennox suggested the use of timestamps might not be exact enough. [Jonathan made a belated IPR disclosure, see below.] Colin: no, it is exact if implemented correctly. Roni suggested that the payload format mandate the use of AVPF. Colin disagreed a payload format specification can recommend, that is all. Roni accepted this view. Ye-Kui Wang expressed concern about sync delay for multicast. Colin stated that if RTCP bandwidth is set high enough, delay can be reduced. Jonathan Lennox asked, since all layers come from one source, why do the timestamps need to be randomized separately. Colin pointed out that it is important to use the standard RTP mechanism to sync between sessions rather than having the payload format define its own mechanisms. Stephan Wenger pointed out that without the CL-DON, SVC may not work because 3GPP and other SDOs may not implement RTCP SRs -- assuming their presence is an ivory tower view. Colin replied that you need to implement the protocol correctly if you are going to get the benefits built into it. Stephan noted the bandwidth constraints under which radio systems operate. Colin said RTCP did not use all that much bandwidth, and that is the cost of doing RTP. Jonathan Lennox commented that the RTP spec does not mention timestamps for layering. Colin replied that there was no need to mention this, because layers are not a special case. Stephan repeated that without CL-DON, you have to rely on RTCP SRs. Colin suggested they seem to be assuming an association between timestamp and sequence number that is not required. The RTP mechanisms for synchronization and detection of loss say nothing about order of encoding. Jonathan raised a concern that if you lose that first RTCP message, it will be 5 s before you can fix it. Colin replied that if this is a problem for SVC, it is also a problem for other payloads, so work on a general solution. Stephan Wenger indicated that the reason for adding bits (DON field) was because the general information in the RTP header was not sufficient unless you add a number of constraints. One would be alignment of the timestamps in the sessions. He proposed implicitly signaling a starting point for the timestamps, so that security is maintained without randomization. Colin suggested as an alternative to signal RTP-NTP timestamp mapping in the initial signaling, just as a primer. Jonathan worried that the clock rate may not be exact to start with. Colin said to send RTCP besides. It may be that they are trying to impose too much semantic content on the available information, and that many codecs may need more information for decoding than the RTP timestamp and sequence number provide. Jonathan worried that adding options within options (in PACSI NAL) seems like a bad idea. Keep the bits mandatory; they are needed to avoid long-term corruption if you lose reference. They have implemented this. Ye-Kui indicated that the purpose of the option is to save senders from having to calculate the values, not so much to save bits. Stephan added that the reason for making the bits optional is because vendors expect that others have patents, so they want to be able to avoid including functionality that is patented. Let the market decide between options or over-encumbered protocol. Discussion timed out at this point. Jonathan Lennox provided a belated IPR disclosure, basically due to a change in company name; a previous disclosure applies. He agreed that a new disclosure would be filed under the present company name. His firm has implemented previous version of the draft, and it worked. He urged not to make particular fields optional because recovery of the proper frame is essential. Ye-Kui countered that the need is conditional. Stephan Wenger's assessment was that this is a case of two camps each worried about the other side trying to make their patent optional. Can choose degree of encumbrance vs. interoperability. Colin: we will have an offline meeting with interested parties to try to sort the issues out. Interested parties should come forward after the meeting. Roni: would like more revisions between meetings, perhaps by forming a design team. [Note that an open design team was subsequently convened, as mentioned above.] Roni noted that MMUSIC work has to be done in parallel. 15:50 RTP Payload Format for MVC (multiview video) Ye-Kui draft-wang-avt-rtp-mvc-00 ==================================================== Ye-Kui Wang presented. Slides are at http://www3.ietf.org/proceedings/07dec/slides/avt-1.pdf. The summary of the presentation is as follows: Slides 3-5 (following the outline): introduction to the technology Slide 6: relationship to SVC Slide 7: status of MVC standardization in JVT and ITU-T SG 16 Slide 8: summary of the draft Slide 9: questions to the Working Group (about how to proceed) Multiple cameras capture the same scene from different viewpoints. The problem of transmitting the resultant media stream has similarities to SVC. The work started in JVT in July 2006, and is supposed to be done next year. The authors are not sure if a new packet format is needed, but a new media type required. They are looking for guidance on draft structure. Colin stated that merging the MVC and SVC drafts would be a mistake, cut and paste leads to problems so referencing would be preferable (but may not be understandable). Stephan feared that referencing would require to read too many drafts in parallel, and he would prefer an independent draft. Colin: understandable and correct are the goals. Probably have to wait for the SVC RFC to be published before doing a cut-and-paste. 16:00 RTCP reporting by translators Hunt draft-hunt-avt-rtcptrans-00 =================================== Geoff Hunt presented. slides are at http://www3.ietf.org/proceedings/07dec/slides/avt-2.pdf. Summary of the presentation: Slide 3: origin of the work Slide 4: main issues slide 5: whether it should be a WG item This draft grew out of the RTCP HR work. The new ideas it presents are policy-based forwarding and profile specification. The major issue in the author's mind was whether this would be a normative or informative document. Colin responded that we had the choice of Informational (useful) or BCP (also useful). Informative versus normative is determined by whether it is mostly explanatory like the topologies draft, or whether it is giving guidance about how to do it. This might be two different drafts. Geoff asked if this was seen as useful work. Colin agreed that it was. Should it be a Working Group item? Colin thought the answer was yes, but we need to define the nature of the draft before deciding whether it will be a Working Group item. Geoff suggested that if there are two drafts, this would be the Informational one. Jonathan lennox pointed out that there is a gap, that AVPF needs to be considered. Geoff agreed that he should include material on dealing with AVPF. Stephen Casner noted that RFC 3550 is general, so it makes sense to have more documents about specific cases as they arise. Colin agreed that we have failed to meet implementors' needs by describing what has to be implemented for particular applications. Stephan Wenger demurred that the IETF is not very good at that, and should leave profiling to implementation groups. 16:10 RTCP HR (High resolution VoIP metrics) Clark draft-ietf-avt-rtcphr-02 ================================ Geoff Hunt presented. Slides may be found at http://www3.ietf.org/proceedings/07dec/slides/avt-3.pdf. Summary of the presentation: Slide 3: summary of changes since IETF 69 Slide 4: change in block types Slide 5: change in relaying of statistics to/from "external" networks Slides 6-8: change to using profiles in SDP to select block types and subtypes to be reported Slide 9: remaining issues Slide 10: urge to WGLC Backup slides: more detail on relaying of statistics to/from external networks Slide 5: Stephen Casner asked a clarifying question. He noted that according to RFC 3550 mixers and endpoints MAY act as relays. Colin: Re slides 7-8. SDP is not suitable for profiles -- get problems with feature interactions between RTP profiles and RTCP profiles. The detailed approach has worked in the past, but it, too, could have problems here. Alan clarified that this negotiation should only cover RTCP HR. From Colin's point of view, the underlying problem is that there are too many options in RTCPHR. After slide 9, Colin summed up: the draft still contains too many unrelated topics and needs to be split still further, following the suggestions he made to the list ("Re: [AVT] I-D ACTION:draft-ietf-avt-rtcphr-02.txt", 02/12/2007 11:19 AM EST). 16:30 RTCP XR Video Metrics Clark draft-ietf-avt-rtcpxr-video-02 draft-ietf-avt-rtcpxr-audio-01 ================================ Alan Clark presented. Slides may be found at http://www3.ietf.org/proceedings/07dec/slides/avt-4.pdf. Summary of presentation: Slide 2 - key changes Slide 3 - Comparison to ATIS 080008 Slide 4 - Status of transport drafts Slide 5 - Comments received (Hedayat proposal for addition) Stephan Wenger remarked that differentiation between I and BP frames is an over-simplification for recent codecs. Alan Clark responded that he could change the terminology to distinguish errors that are direct versus those that are propagated. Stephan said there are three types of frame: frames that are direct, those that are predicted but also referenced, and those that are only predicted. Alan mused that if one is measuring in the middle, one can't really tell what kind of frame the packet loss affects, but endpoint can. Stephan pointed out that even the endpoint does not know whether the frame will be used for reference later. You may not know what kind of frame a packet loss affects. Jonayhan Lennox went further, saying that one can have a perfectly valid packet stream that produces broken video. Ken Toney introduced himself, as chair of the ATIS group defining these measurements. The segment of the industry that drove the ATIS work was based on MPEG2 and RTP, hence that is why the metrics fit it. Stephan responded: OK, call this an MPEG2 metric rather than XR metric. Roni Even pointed out that Alan is not getting input about what to put in place of the current parameters. Stephan said he does not know how to design a metric for H.264 that would fit into this framework ... or in fact, any metric at all. Tom Taylor asked whether we need to back off to just a measure of packet loss. Alan replied that there is still a lot of MPEG2, and also H.264 that uses a structure like IPB frames. Ye-Kui said that there were attempts at metric definitions in 2004, about seven of them. He had emailed the references to Alan. Stephan warned that the interpretation of these metrics for MPEG2 and H.264 would be different even when H.264 does use this structure because the dependency is so different. Label metrics with the specific codec to which they apply. Interpretation is the real problem. e.g. Jitter is not meaningful for MPEG-2, which does not transmit in presentation order. Stephen Casner suggested that we treat the metrics as comparative only, not absolute. Ye-Kui added that one should have the midpoints mimic endpoint behavior to give similar results. Tom Taylor asked why not use SNMP? Alan replied that the two systems are complementary. SNMP is not scalable for millions of endpoints. RTCP is more light-weight, and provides feedback to the endpoint. Colin expressed the opinion that the main problem is that these proposals are a grab-bag collection of metrics suitable for a particular application, rather than being an architectural view of the metric requirements for video in general. Alan said that the authors were focused on problems from industry that are naturally application-specific, but he agrees that an architectural view is a good idea. Colin made the specific suggestion that the authors not mix audio, video, playout buffer, and MPEG metrics all in one draft. Alan was concerned that dividing it up would result in a series of small blocks. Colin agreed, but then you could mix and match. Kaynam Hedayat remarked in this context that RFC 3611 is closely targeted to VoIP, so does not support audio generically. His bit-map proposal for I, P, and B frames would be a more general approach than the XR draft has taken. Alan rejoined that the interpretation of the bits would be application-specific. 16:40 RTP payload format for ITU-T EVBR speech/audio Lakaniemi draft-lakaniemi-avt-rtp-evbr-00 ================================ Ari Lakaniemi presented. Slides are at http://www3.ietf.org/proceedings/07dec/slides/avt-5.pdf. Ari announced potential Nokia IPR in the draft. Summary of presentation: Slide 4 - overview of the codec being developed by ITU-T Study Group 16. Slides 5-7 - payload format proposal. Slides 8-10 - issues: cross-layer synchronization (as with SVC), codec bit rate configuration/control, layer configuration signalling. Slide 11 - next steps. The SG16 work should be done early in 2008. Regarding the question of synchronization mechanisms, Colin suggested that, as for SVC, one could signal initial timing information in SDP. On the issue of codec bit-rate configuration, Joerg Ott noted that the payload formats seem to be wanting receiver to tell the sender what to do. Instead, there is enough raw information that the sender can figure out what it should do on its own. The sender may have other codec choices and the receiver should not have to second-guess what is available. Also, one should not make the signaling specific to a particular codec, because you might switch among codecs. Further comments were cut off at this point because of lack of time. Stephan responded to Joerg, saying that before jumping to conclusions, Ari should come up with scenarios first. Stephan agreed in philosophy with Joerg, but the generic answer may not work. The layer configuration slide had the same issue. Discuss on the list. Colin warned that the Transport ADs would like an architectural view going beyond a single payload type. We should see another revision before adding the work to the charter. Tuesday, 4 December2007 at 18:50-19:50 (Salon 3) 16:55 Non compound RTCP Johansson draft-ietf-avt-rtcp-non-compound-01 ================================ This presentation was deferred from the Monday meeting for lack of time. Ingemar Johansson presented. The slides are at http://www3.ietf.org/proceedings/07dec/slides/avt-6.pdf. Summary: Slide 3 - definitions Slide 4 - use cases: codec control signalling, feedback, possibly others. Slides 5-7 - allocation of RTCP bandwidth between compound and "non-compound" RTCP Slides 8-10 - "allow immediate flag as proposed solution to increase responsiveness Slide 11 - further issues. Jonathan Lennox commented that the name of the draft should be RTCP without SR and SDES -- one can still have compound RTCP within the proposed definition. Colin said that the distinction between compound and non-compound RTCP is trivial. But if we are talking about relaxed compounding rules, the cases are harder to distinguish. Magnus Westerlund provided further explanation. Colin asked why it is necessary to distinguish the two types if it's just a matter of different composition of compound packets? Magnus replied that this is primarily to support enforcement of the standard. Joerg pointed out that smaller average RTCP packet size means more frquent transmission is permissible -- we don't need special rules for an immediate flag. Magnus said it was needed because they were splitting the bandwidth allocation into pools. Colin, Joerg saw no reason to add this complexity -- existing tools will work. Joerg said that if they want to change timing rules, they should do in a separate draft. Magnus noted that with the proposed relaxation of RTCP compounding rules one gets unstable estimates of average packet size. Joerg mused that this work keeps bringing up new strange things before the old strange things have been worked out. He made the suggestion that in the point-to-point case, one could do a prediction of average size of packets sent. Of course, this doesn't cover multicast. A timeout was called at this point -- no on-the-fly design. Joerg was concerned that the proposed algorithm encourages burst behaviour. Colin summed up that it appears we need better explanation of the problem, why existing design doesn't work. We should have simulations or the like for background. There may be SRTCP issues. 18:50 DTLS-SRTP status Chairs draft-ietf-avt-dtls-srtp-01 ================================ Eric Rescorla presented. His single status chart is at http://www3.ietf.org/proceedings/07dec/slides/avt-10.pdf. Eric noted recent changes, and said that he believes the AVT work is done. He will be moving SDP-related stuff into this draft. He will give time for people to comment, looking to a WGLC just after Christmas. 18:55 Source-Specific Parameters for H.264 and H.264 SVC Lennox draft-lennox-avt-h264-source-fmtp-00 ================================ Jonathan Lennox presented. His slides are at http://www3.ietf.org/proceedings/07dec/slides/avt-11.pdf. Summary: Slide 2 - motivation Slide 3 - proposed solution Slide 4 - compatibility Slide 5 - next steps. H.264 has provided parameter sets that describe sender rather than receiver preferences. There is a confusing interaction with usual SDP semantics. Jonathan has done work on this in MMUSIC. Is there AVT interest? Roni provided a second stream example, noting that the receiver doesn't know the SSRC in advance. Jonathan replied that one could use the SIP model --re-INVITE before sending. Roni suggested he may want to consider an in-band solution. Jonathan declined to include this case in his draft. Stephan Wenger stated that the problem being addressed is not critical for non-scalable H.264, though it is real. The proposals are useful, so he is not opposed to them. He confessed to some confusion about the effect of the syntax -- it updates everything in the session description but the media types. Roni Even said that Polycom is shipping an in-band solution. Stephan said that Nokia's solution does not use in-band indications. Roni pointed out that the issue of repetition of parameter sets is a major issue when switching sources. Summary: work on assumption that it is AVT draft eventually, keeping MMUSIC informed. Provide good motivation. 19:05 DTLS-SRTP Key Transport Wing draft-wing-avt-dtls-srtp-key-transport-00 ================================ Dan Wing presented. His slides are at http://www3.ietf.org/proceedings/07dec/slides/avt-8.pdf. Summary: Slide 2 - scope -- extension of DTLS-SRTP from Point-to-point to point-to-multipoint. Slides 3-4 - use cases -- mixers, video switchers Slides 5-6 - proposed solution -- distribute common key to all listeners Slide 7 - implication -- have to change keys when listener membership changes Slide 8 - solicitation of interest. Dan's main motivation was to save on the processing power needed at a mixer to encrypt each listener's stream using a separate key. Dave Oran remarked that video and audio are different. Dan forgot that one of the speakers may also be a listener, and gets a different stream from the other listeners. Dan did not see this as an issue -- his proposal would optimize for the 50 other listeners. Dave warned that he should think about a dynamic situation. Magnus Westerlund objected that Dan was breaking a basic assumption of security model. Dan complained that the requirement to encrypt every stream separately was a real barrier to implementation. Magnus warned that with Dan's model listeners can create bogus streams. Dan argued that one could provide a different key for the speaker. Dan agrees there is an issue with DTLS, hence this is a proposed extension. Roni raised a point on the use case. On the requirements chart (slide 7), it was noted that there is a lot of churn in listenership at the start of a conference. Magnus suggested that Dan should check out topology draft. Dan said that he had used it as guidance for this work. Rohan Mahy suggested that the first requirement be a SHOULD. He gave a counter-example where the conference is recorded and all participants are allowed access to recording. Hence there is no point in changing the key. Flemming Andreasen questioned why different key is given to each listener in the first place. The answer is that this is a DTLS property. Discussion was timed out at this point. Colin summarized that there is definitely interest, Dan should keep working, but this is not ready to be a WG item. 19:15 Using SEED Cipher Algorithm with SRTP Seokung draft-ietf-avt-seed-srtp-00 ================================ Yoon Seokung spoke about the draft. His slides are at http://www3.ietf.org/proceedings/07dec/slides/avt-9.pdf. Summary: Slide 2 - goal and features Slide 3 - next steps The Chairs called for WG feedback on draft, to get it ready for WGLC. There should be a new version based on this feedback before the next meeting. 19:25 End