[Asterisk-video] Re: [asterisk-dev] Video packetization proposal

Mon Jun 4 08:20:44 MST 2007

On Jun 2, 2007, at 5:53 AM, Sergio Garcia Murillo wrote:
> Then, create another document to adress the theora packetization
> independently
> of IAX, so we could use it also for rtp.

People have already came up with proposals for Theora packetization  
for RTP. Here's a draft from Xiph:

http://svn.xiph.org/trunk/theora/doc/draft-ietf-avt-rtp-theora-00.txt

I borrowed liberally from that draft when I wrote my proposal, but  
there are things in there that I believe are not necessary.  For  
example they allow multiple video frames per RTP frame.  I believe  
that it is far more likely to have video frames that are too big to  
fit in one network frame than to have video frames small enough to  
warrant coalescing them.

As far as I know that draft hasn't got a lot of circulation and I'm  
not aware of any implementations.  As such, I don't think it should  
be considered gospel.

>> Yes, but if one extra bit makes my life as a programmer easier, I
>> would go for that.
>
> Avoiding "if"s are going to make your life as a programer easier?
> And you should put those same to ifs whe dealing rtp>IAX translation.
> Duplicating information on the header I think it's not a good idea.
Actually it will make my life easier, especially in the case when we  
have to deal with lost or miss-ordered frames.
Why exactly do you think it is a bad idea? You could easily  
encapsulate the specific video/Theora bits into a RTP frame making  
the IAX2-RTP translation almost trivial.

>> Some application benefit from knowing the type of frame without
>> actually decoding it.  One example would be video conferencing
>> applications, and more specifically app_conference.  When you want to
>> switch the video stream from one person to another, you want to do it
>> on a key frame, otherwise you get garbage.
>>
>
> Yes, It will make life easier for app_conference, but again it will  
> make
> very
> difficult the rtp>IAX transaltion as you'll have to dig into the  
> encoded
> bitstream
> to get that info.
What I am trying to avoid is digging into the encoded bitstream in  
application like app_conference.  Again, if we make this bits part of  
the video/Theora payload, then moving the payload between IAX and RTP  
stream should be easy.

>>> The timpestamp conundrum, this is quite a hard problem.
>>> As you have defined in your document the iax field would be a timer
>>> reference or frame duration
>>> instead of a timespamp. I've do it that why youll have a big
>>> problem if you
>>> want to translate
>>> from rtp to IAX. The only way you can do it is storing all the
>>> packets of a
>>> frame till you've got
>>> the first packet of the next video frame, then (and only then) you
>>> could
>>> calculate your "timestamp" field.
>> It is not the frame duration, it is the time difference (in ms)
>> between the time of transmission of this frame and the time of
>> transmission of the first frame in the call.
>
> Ok, I understood wrong the header, but then there should be no problem
> converting
> from one value to another at all.
Well, there are slight differences in the semantics of the two values.

> I wasn't saying to allow multiple video frames in a packet, just  
> what asking
> how
> the ts would be handled for the all the packets of the frame.
> But this arise another wuestion, what are you going to do with h264? I
> haven't still
> have got much time to study the h263 packetization, but if seen  
> that it
> allows many
> NALS to be send in one rtp packet. How would you set the I/P bit  
> then? Are
> you
> proposing to split that frame into multiple ones?

If the RTP frame comprises one video frame, then there would be no  
problem in having multiple NALs in the IAX frame as well.  If the  
NALs mean multiple video frames, then I believe they should be split  
over into several IAX frames, each one having its own timestamp.

Mihai