[asterisk-dev] IAX Generic Media Frame Specification proposal -
second draft
Mihai Balea
mihai at hates.ms
Tue Jun 5 20:04:29 MST 2007
Hi all,
I'm attaching the second draft of the IAX Generic Media Frame
Specification proposal (formerly known as the IAX Video Packetization
proposal). In fact "video" would be kind of a misnomer now, since
the proposal attempts to provide a generic solution for the transport
of all kinds of media.
What's new:
- use 16 bits for timestamp and stream ID: since this type of
frames can be used to transport media other than video, it will be
beneficial to reduce the size of the header.
- use of RTP payload formats is encouraged, since it will enhance
interoperability between the two formats
- clarified definition of stream IDs
- changed and clarified the definition of Payload Type - now
called Format ID.
- defined timestamps in a similar way as in RTP (keeping the 1Khz
IAX clock): enhances interoperability and helps synchronization
between streams.
- added references to related standards and drafts
- many other minor improvements ...
As always, I would like to hear your comments and suggestions.
Cheers
Mihai
-------------- next part --------------
Mihai Balea
<mihai AT hates DOT ms>
IAX generic media frames
- updated 06/05/07 -
0. Abstract
This proposal describes a specification for non-reliable media transport over
the IAX protocol. While the main focus of the specification is to address the
issue of video packetization and transport, the proposed protocol extensions
should apply equally well to other type of media such as fax.
1. Issues related to the transport of video/large media frames over IAX
Sending video over IAX frames presents a number of unique issues:
- Frames can be larger than the standard MTU. For a resolution of 320x240,
key frames are larger than the MTU on a regular basis. Even regular frames
(p-frames) exceed this limit at times. As a result, a video-enabled IAX
implementation must be able to split a video frame over multiple IAX frames
(called slices). The receiver must be able to reassemble the original video
frame before passing it to the video decoder.
- Some codecs (H.264) have built in packet loss compensation. Other codecs
(Theora) do not have any such mechanism. For such codecs, it is imperative
that video slices are assembled in the right order and the beginning and the
end of a video frame are properly signaled.
- Some applications switch video sources on the fly (conferencing, video on
hold, etc). Codecs that do not use a fixed code-book (Theora) need to know
when this happens in order to use the appropriate code-book. Even for codecs
that use fixed code-books, when a video source change occurs, it is desirable
to wait until the next key frame is received before continuing to display video
- Some applications can benefit from knowing the type of frame (keyframe,
p-frame, etc)
Some of these issues are present when sending other types of media, for
example images. A solution should be flexible enough to allow for different
types of media.
For reference, I am including the current structure of a video meta-frame, as
described in the latest IAX2 draft [1]
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| Meta Indicator |V| Source Call Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|?| time-stamp | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| Data |
: :
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2. Proposed Media Frame Structure
The new media frame format includes a generic header followed by format
specific headers and payloads. This document describes the generic header.
Since interoperability with RTP streams is desirable, the new media frames
should encapsulate information that is semantically similar to RTP fields, in
order to facilitate translations between the two transports. We do not attempt
to provide a 1 to 1 mapping of RTP fields to IAX fields, but rather provide
enough information to recreate one from another while at the same time maintain
the spirit of IAX.
2.1 Generic Header
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F| Meta Indicator |V| Source Call Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time-Stamp | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
| Stream ID | Format ID | Flags |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
| |
: Data :
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Field description:
The first 32 bits (F, Meta Indicator, V, Source Call Number) have similar
semantics as in section 8.1.3.1 of the current IAX2 draft [1]. The V flag is
extended to mean not only video, but any media frame using the new header.
Time Stamp
The peer's lower 16 bits of the full 32 bit time stamp. The
timestamp is expressed in ms and is defined as the time of digitization of
the first data byte, relative to the beginning of the call.
Stream ID
16 bit stream identifier, as negotiated during call setup
Sequence Number
16 bit sequence number. Starts at 0 when a stream is
initialized and is incremented for each packet. Each media stream will have
its own set of sequence numbers.
Format ID
Negotiated during NEW or RENEW transactions. During the
negotiation process, the endpoints dynamically assign Format ID numbers to
sets of media formats/codecs and associated parameters (sample rate, bitrate,
resolution, etc). Each media stream has its own independent set of
negotiated Format IDs.
Flags
Each negotiated media format will have its own flags in this field.
If a media format does not require flags, it MUST set all bits in Flags to
zero. Similar types of media formats, such as video SHOULD use similar flag
bits as much as possible. For example video codecs should attempt to use the
same bit with the same semantics for signaling a key frame.
Data
Media format specific payload that MIGHT include media format specific
headers. Implementations should attempt to use the same payload format as
for RTP streams.
2.2 Video Specific Comments
Video codecs using the above described media packet structure SHOULD attempt to
use similar flags. On possible set of flags that should cover many video
applications would be
xxxx xxKM
K
1 bit: set to 1 if the data in the IAX frame belongs to a video key frame,
0 otherwise
M
1 bit: Similar semantics as in RTP[2]: defined by the format type, usually
marks the end of a set of slices.
Applications using video should adopt the same payload format as for RTP
(Theora [3], H.264 [4])
2.3 Issues still TBD:
- Should we expand the K flag to multiple bits so we can differentiate between
p-frames and b-frames?
3. References
[1] Guy, E. et al., IAX2: Inter-Asterisk eXchange Version 2
draft-guy-iax-03
[2] Schulzrine, H. et al., RTP: A Transport Protocol for Real-Time Applications
RFC 1889
[3] Barbato, L., RTP Payload Format for Theora Encoded Video
draft-barbato-avt-rtp-theora-01
[4] Wenger, S. et al., RTP Payload Format for H.264 Video
RFC 3984
More information about the asterisk-dev
mailing list