[Asterisk-Dev] meetme enhancements to improve efficiency
Steve Kann
stevek at stevek.com
Fri Dec 2 09:36:14 MST 2005
Kevin P. Fleming wrote:
> Steve Kann wrote:
>
>> Actually, they other three aren't getting any mixing in this case,
>> they just directly get the frames from the speaker. (they're not
>> decoded and recoded at all, just send straight along). This means:
>>
>> (1) that they get exactly the same audio quality as if they were
>> directly bridged, with no generational loss.
>> (2) There is zero transcoding happening.
>
>
> OK, my example was a poor choice then. However, I see what you are
> doing now... in the normal case where only one person is speaking, you
> don't 'mix' at all, since there is nothing to mix. Granted, this
> dependent on the quality of your 'talk detection' algorithm, but we
> are already doing that successfully :-)
Right. app_conference can presently do VAD on incoming channels itself,
although that functionality was added there just because it was easy --
it probably ought to be pushed as far out to the source as possible
(i.e. in chan_zap or on hardware for TDM, into the remote side for VoIP,
etc).
> It also works primarily only for VOIP channels; bringing Zap channels
> into the conference will necessitate mixing again.
Well, like I said, it can do VAD itself (using libspeex' vad). It's
slightly less expensive to do VAD than to do encoding, but it is still
expensive with libspeex' VAD. (the VAD algorighm could be improved
and/or replaced).
Ideally, you'd have a really good (and fast) VAD implementation that's
generic inside of asterisk, and have ZAP channels use that all the time,
which would help here, as well as allow * and zap to save more than 50%
of bandwidth when zap channels eventually go over IP.
>> Right. app_conference breaks the rule that we must always transmit
>> encoded audio to participants which come from the same encoder. In
>> practice, with all of the codecs we've done this with (which is
>> really just GSM and speex that I've done this extensively with, but
>> it should be fine with others), this doesn't lead to any noticable
>> negative effects. This is because in practice:
>>
>> 1) This generally happens when both encoder states are starting from
>> a "silence" state, so they're pretty much the same, or
>> 2) It's happening when there's multiple speakers speaking at the same
>> time, in which case with all the crosstalk, you don't really notice
>> anything.
>
>
> Understood. It would be interesting to try this with G.729 as well,
> although I doubt it will be any more state-sensitive than Speex.
Right; I don't see why it would be.
>
> In any case, this is a very different architecture than app_meetme,
> and I don't think there's a reasonable way to merge the two together.
> It could certainly be possible to optimize the mixed audio -> encoded
> audio path by using a single encoder for all non-speakers (and 'break
> the rules' as you say), but the other optimizations would be harder to
> work into app_meetme, since it is very much based around mixing all
> the streams through Zaptel.
Yes, it is very different. That's why I thought it would be simpler to
either implement the "features" of meetme as an adjunct to
app_conference, or what could be done, is to take the algorithm from
app_conference, and make that a "service" inside asterisk (just like
bridging two channels), and then you'd add the "features" as a simpler
app_meetme.
The talk (probably more than a year ago) was that you could extend the
"bridging" features inside of asterisk, so that you could, through the
same API, add more than two channels to a bridge. Once you added a
third, it would use an algorithm like that in app_conference.
Originally, people threw out the idea to always use this algorithm, even
for the 2 party case, but it isn't ideal, because it adds latency by
trying to "synchronize" the frames, which isn't necessary when you're
bridging two VoIP channels.
-SteveK
More information about the asterisk-dev
mailing list