[asterisk-dev] IAX Generic Media Frame Specification proposal - second draft

Tue Jun 5 20:04:29 MST 2007

Hi all,

I'm attaching the second draft of the IAX Generic Media Frame  
Specification proposal (formerly known as the IAX Video Packetization  
proposal).  In fact "video" would be kind of a misnomer now, since  
the proposal attempts to provide a generic solution for the transport  
of all kinds of media.

What's new:
  - use 16 bits for timestamp and stream ID: since this type of  
frames can be used to transport media other than video, it will be  
beneficial to reduce the size of the header.
  - use of RTP payload formats is encouraged, since it will enhance  
interoperability between the two formats
  - clarified definition of stream IDs
  - changed and clarified the definition of Payload Type  - now  
called Format ID.
  - defined timestamps in a similar way as in RTP (keeping the 1Khz  
IAX clock): enhances interoperability and helps synchronization  
between streams.
  - added references to related standards and drafts
  - many other minor improvements ...

As always, I would like to hear your comments and suggestions.

Cheers
Mihai

-------------- next part --------------
                                                                   Mihai Balea
                                                       <mihai AT hates DOT ms>

                         IAX generic media frames
			               - updated 06/05/07 -

0. Abstract

This proposal describes a specification for non-reliable media transport over
the IAX protocol.  While the main focus of the specification is to address the
issue of video packetization and transport, the proposed protocol extensions 
should apply equally well to other type of media such as fax.

1. Issues related to the transport of video/large media frames over IAX

Sending video over IAX frames presents a number of unique issues:
- Frames can be larger than the standard MTU.  For a resolution of 320x240,
key frames are larger than the MTU on a regular basis.  Even regular frames 
(p-frames) exceed this limit at times.  As a result, a video-enabled IAX 
implementation must be able to split a video frame over multiple IAX frames
(called slices).  The receiver must be able to reassemble the original video
frame before passing it to the video decoder.
- Some codecs (H.264) have built in packet loss compensation.  Other codecs 
(Theora) do not have any such mechanism. For such codecs, it is imperative 
that video slices are assembled in the right order and the beginning and the
end of a video frame are properly signaled.
- Some applications switch video sources on the fly (conferencing, video on
hold, etc).  Codecs that do not use a fixed code-book (Theora) need to know 
when this happens in order to use the appropriate code-book.  Even for codecs
that use fixed code-books, when a video source change occurs, it is desirable 
to wait until the next key frame is received before continuing to display video
- Some applications can benefit from knowing the type of frame (keyframe, 
p-frame, etc)

Some of these issues are present when sending other types of media, for 
example images.  A solution should be flexible enough to allow for different
types of media.

For reference, I am including the current structure of a video meta-frame, as 
described in the latest IAX2 draft [1]

                     1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|         Meta Indicator      |V|      Source Call Number     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|?|          time-stamp         |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
|                                         Data                  |
:                                                               :
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

2. Proposed Media Frame Structure

The new media frame format includes a generic header followed by format 
specific headers and payloads.  This document describes the generic header.

Since interoperability with RTP streams is desirable, the new media frames
should encapsulate information that is semantically similar to RTP fields, in
order to facilitate translations between the two transports. We do not attempt
to provide a 1 to 1 mapping of RTP fields to IAX fields, but rather provide 
enough information to recreate one from another while at the same time maintain
the spirit of IAX.

2.1 Generic Header
                     1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|         Meta Indicator      |V|      Source Call Number     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           Time-Stamp          |        Sequence Number        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
|           Stream ID           |   Format ID   |     Flags     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
|                                                               |
:                            Data                               :
|                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Field description:

The first 32 bits (F, Meta Indicator, V, Source Call Number) have similar 
semantics as in section 8.1.3.1 of the current IAX2 draft [1].  The V flag is
extended to mean not only video, but any media frame using the new header.

Time Stamp

  The peer's lower 16 bits of the full 32 bit time stamp.  The 
  timestamp is expressed in ms and is defined as the time of digitization of 
  the first data byte, relative to the beginning of the call. 

Stream ID

  16 bit stream identifier, as negotiated during call setup

Sequence Number

  16 bit sequence number.  Starts at 0 when a stream is 
  initialized and is incremented for each packet.  Each media stream will have
  its own set of sequence numbers.

Format ID

  Negotiated during NEW or RENEW transactions.  During the 
  negotiation process, the endpoints dynamically assign Format ID numbers to 
  sets of media formats/codecs and associated parameters (sample rate, bitrate,
  resolution, etc).  Each media stream has its own independent set of 
  negotiated Format IDs.

Flags

  Each negotiated media format will have its own flags in this field.
  If a media format does not require flags, it MUST set all bits in Flags to 
  zero.  Similar types of media formats, such as video SHOULD use similar flag 
  bits as much as possible.  For example video codecs should attempt to use the 
  same bit with the same semantics for signaling a key frame.

Data

  Media format specific payload that MIGHT include media format specific
  headers.  Implementations should attempt to use the same payload format as
  for RTP streams.

2.2 Video Specific Comments

Video codecs using the above described media packet structure SHOULD attempt to
use similar flags.  On possible set of flags that should cover many video 
applications would be

xxxx xxKM

K

  1 bit: set to 1 if the data in the IAX frame belongs to a video key frame, 
  0 otherwise

M

  1 bit: Similar semantics as in RTP[2]: defined by the format type, usually
  marks the end of a set of slices.

Applications using video should adopt the same payload format as for RTP
(Theora [3], H.264 [4])

2.3  Issues still TBD:

- Should we expand the K flag to multiple bits so we can differentiate between 
p-frames and b-frames?

3. References

[1] Guy, E. et al., IAX2: Inter-Asterisk eXchange Version 2 
    draft-guy-iax-03
[2] Schulzrine, H. et al., RTP: A Transport Protocol for Real-Time Applications
    RFC 1889
[3] Barbato, L., RTP Payload Format for Theora Encoded Video
    draft-barbato-avt-rtp-theora-01
[4] Wenger, S. et al., RTP Payload Format for H.264 Video
    RFC 3984