[asterisk-dev] Video packetization proposal

Mon Jun 4 09:33:31 MST 2007

On 6/2/07, Sergio Garcia Murillo <sergio.garcia at fontventa.com> wrote:
> > > I,P,B framet indication is not needed as it should the decoder the
> > > one who is in charge of that.
> > Some application benefit from knowing the type of frame without
> > actually decoding it.  One example would be video conferencing
> > applications, and more specifically app_conference.  When you want to
> > switch the video stream from one person to another, you want to do it
> > on a key frame, otherwise you get garbage.
>
> Yes, It will make life easier for app_conference, but again it will make
> very difficult the rtp>IAX transaltion as you'll have to dig into the encoded
> bitstream to get that info.

I'm no RTP expert, but doesn't the 7-bit payload type (PT) field in
the RTP header serve the same purpose as the 8-bit payload type that
Mihai proposes? RFC 3551 describes some well-known mappings for the
A/V profile.

You seem to be proposing that instead of identifying the video format
in this header field, the receiver should pick apart the data payload
to figure out what kind of data is present. This seems wrought with
peril for two reasons.

First, I don't know of any guarantee that various video codecs will
have identifiable payloads. Without the payload type header, we would
be demanding that the data payload have sufficient information for
both identifying the codec _and_ determining codec parameters.

Secondly, even if there was a way for receivers to identify the
payload type, they would then have the burden of knowing the magic
algorithms for determining the payload types for all the
codecs/formats it might run into. This seems like an undue burden for
clients.

> > > The timpestamp conundrum, this is quite a hard problem.
> > > As you have defined in your document the iax field would be a timer
> > > reference or frame duration instead of a timespamp. I've do it that why
> > > youll have a big problem if you want to translate from rtp to IAX. The
> > > only way you can do it is storing all the packets of a frame till you've
> > > got the first packet of the next video frame, then (and only then) you
> > > could calculate your "timestamp" field.
> > It is not the frame duration, it is the time difference (in ms)
> > between the time of transmission of this frame and the time of
> > transmission of the first frame in the call.
>
> Ok, I understood wrong the header, but then there should be no problem
> converting from one value to another at all.

You are right that the obvious conversion would be trivial. However,
there is a semantic difference between RTP and IAX. The RTP timestamp
is the presentation time for the media. The IAX timestamp is the
transmission time. Presentation time and transmission time may or may
not be related. I really don't know. For asterisk, the transmission
time is important for dejittering the packet stream. There seems to be
an implicit assumption that presentation time is "upon receipt" which
is different than "when the packet says".

Also, the RTP timestamp uses a 90kHz clock. The IAX timestamp is
measured in milliseconds which is effectively a 1kHz clock. There is a
rather large difference in precision between these two thus
information would be lost in RTP to IAX mappings and IAX obviously
does not have sufficient information to match RTP's precision in the
IAX to RTP mapping case. Does this matter? Seems like it is worth
consideration.

Pete