[Asterisk-Dev] meetme enhancements to improve efficiency

Fri Dec 2 08:48:38 MST 2005

Kevin P. Fleming wrote:

> Steve Kann wrote:
>
>> Sure.  If you want to talk about it, I'm happy to do so.
>
>
> OK... so that's basically the point I was trying to make:
>
> If you have four participants using GSM (for example), and one of them 
> is speaking, then the other three can receive the same GSM frames of 
> the conference's mixed audio. That seems pretty obvious.

Actually, they other three aren't getting any mixing in this case, they 
just directly get the frames from the speaker.  (they're not decoded and 
recoded at all, just send straight along).    This means:

(1) that they get exactly the same audio quality as if they were 
directly bridged, with no generational loss.
(2) There is zero transcoding happening.

> However, when one of the other participants begins speaking, they now 
> need to receive the mixed audio _minus_ their own contribution. 
> Switching source streams means switching to a separate translator 
> (codec) path, and that new path won't have the history built up from 
> the previous frames (that were sent through the common path), so the 
> first few frames encoded by that translator will produce less than 
> optimal results (at best).

Right.   app_conference breaks the rule that we must always transmit 
encoded audio to participants which come from the same encoder.    In 
practice, with all of the codecs we've done this with (which is really 
just GSM and speex that I've done this extensively with, but it should 
be fine with others), this doesn't lead to any noticable negative 
effects.  This is because in practice:

1) This generally happens when both encoder states are starting from a 
"silence" state, so they're pretty much the same, or
2) It's happening when there's multiple speakers speaking at the same 
time, in which case with all the crosstalk, you don't really notice 
anything.

All VoIP codecs need to have the ability to deal with the loss of state 
synchronization, because they all need to deal with packet loss, 
starting in the middle of a session, etc.   So, codec authors kinda 
cringe, saying "you can't do that", but, in practice, it works really 
well.  The codec authors might say, you have two options to do this:

1) Send all the speakers encoded streams to each client, and let them 
decode/mix.  (this uses a lot of bandwidth)
2) Decode all the speakers, recode, and send that out to all the users 
(the way meetme works).  This uses a lot of CPU, (it's O(n^2)), and also 
means that you always suffer from generational loss, by 
decoding/recoding everything.  (the difference is noticable).

-SteveK