[asterisk-dev] AST_FRAME_DIGITAL

Matthew Fredrickson creslin at digium.com
Fri Sep 14 10:51:01 CDT 2007


Russell Bryant wrote:
> Matthew Fredrickson wrote:
>>> So, if I get it right - there is no need to introduce AST_FRAME_DIGITAL 
>>> as it is already there (but named AST_FRAME_MODEM)?
>> Yes, basically.  Look in include/frame.h in asterisk 1.4 sources.  There 
>> are already subclasses defined for T38 and V150.  I'm thinking that an 
>> extension to this frametype would give us what we want.  Then an 
>> extension to the translator architecture so that we can make translators 
>> for frames other than AST_FRAME_VOICE.
> 
> I realize that this exists, but the question is for whether it makes sense to do
> so, when the steam itself is actually voice and video.  There is _a lot_ of code
> in Asterisk that expects audio and video to be handled in a certain way, and
> this is an extremely different way to approach it.  That is why I was trying to
> push it that direction.
> 

<snip>

> AST_FRAME_MODEM or DIGITAL or whatever is not going to work without a lot of
> extra effort.  However, as has been suggested, creating an AST_FORMAT_H223 would
> do it.  It's a hack, but you'd have to put the data in an AST_FRAME_VOICE with a
> subclass of AST_FORMAT_H223.  In that case, Asterisk would happily pass it
> through without transcoding it, since it has no codec module to handle it.

I agree that it is going to take some effort to do this the right way. 
But that is why we are having this discussion I think, to decide the 
"right" way to do it.  There are a lot of hackish ways that this can be 
done already.

This would not be another voice frame type, just to clarify that.  This 
would be a separate frame type under AST_FRAME_MODEM (or 
AST_FRAME_DIGITAL).  This would require some truly additional support in 
the translation core to make this work.

<snip>

> Humor me for a bit longer and help me understand why one way requires a lot more
> code in the channel drivers than the other.  In the proposed method, you are
> reading the data and stuffing them in H223 frames and passing them into
> Asterisk.  Now, if the code that the application is using to decode and encode
> the H223 data is put into a library, why is it really any more invasive to the
> channel drivers to do the decoding there?

The reason why I suggest that we do not obligatorily do it for every 
frame is to avoid the encode/decode when it is unnecessary.  I haven't 
looked at the H223 encapsulation in a while, but thinking of it like RTP 
is probably not quite the right way to look at it.  It's a single stream 
that contains audio and video, along with some more metadata.

> ISDN H223 data -> chan_zap, put it into H223 frames -> Asterisk core
> 
> versus
> 
> ISDN H223 data -> chan_zap -> H223decoder to VOICE/VIDEO frames -> Asterisk core


I think perhaps we are not looking at this from a scalability point of 
view.  The big reason to do it using the translator architecture and a 
separate frame type is that it is one less thing to not have to do when 
you do not have to do it.  It is the same reason that we have moved from 
doing RTP reencapsulation to just being able to pass RTP between two 
endpoints without reencapsulation.  Why should we implement this less 
efficiently and elegantly then it should be implemented?

In any case, some of the advantages of doing it using a translation-core 
type infrastructure is that we get a lot of neat things for free, such 
as the T38->ulaw (if that is actually possible).  We are going to have 
to revamp the translator core anyways for doing wideband audio support, 
as well as all the different things that are being done for the 
videocaps branch, so it fits nicely with that project.  We also get 
"H223" pass through when it is not necessary to reencapsulate the audio 
video.

Beyond that, if there is a reason why we cannot encapsulate/decapsulate 
the stream (such as if the library that is used is not a compatible 
license, maybe patent issues, whatever the reason might be) we at the 
very least can bridge ISDN->ISDN H223 calls without having the library, 
much like we can with G.729 or G.723.

> Also, is the stream really encoded in such a way that is very much
> computationally expensive to do the encoding and decoding of the stream?  That
> is a much better argument for avoiding the decoding/encoding when possible, in
> my opinion, if that is the case.  Would you hit a CPU bottleneck from this
> decode/encode process before you would hit a limit on how much ISDN hardware you
> can put in a box?

I cannot give numbers on the computation part.  I do think that it will 
be "not cheap" enough to justify not doing it when you don't have to though.

> If it is not that computationally expensive, then the code would actually end up
> being a lot simpler and easier to maintain if you can avoid having to create the
> local channel back into asterisk that has the decoded stream when you want to
> use it.

I think using a local channel to do is not a robust architecture for 
covering all the cases that I mentioned above.  IMHO, the "right" way to 
do this is doing this like we do codec translation.

-- 
Matthew Fredrickson
Software/Firmware Engineer
Digium, Inc.



More information about the asterisk-dev mailing list