[Asterisk-Dev] meetme enhancements to improve efficiency
Steve Kann
stevek at stevek.com
Fri Dec 2 08:48:38 MST 2005
Kevin P. Fleming wrote:
> Steve Kann wrote:
>
>> Sure. If you want to talk about it, I'm happy to do so.
>
>
> OK... so that's basically the point I was trying to make:
>
> If you have four participants using GSM (for example), and one of them
> is speaking, then the other three can receive the same GSM frames of
> the conference's mixed audio. That seems pretty obvious.
Actually, they other three aren't getting any mixing in this case, they
just directly get the frames from the speaker. (they're not decoded and
recoded at all, just send straight along). This means:
(1) that they get exactly the same audio quality as if they were
directly bridged, with no generational loss.
(2) There is zero transcoding happening.
> However, when one of the other participants begins speaking, they now
> need to receive the mixed audio _minus_ their own contribution.
> Switching source streams means switching to a separate translator
> (codec) path, and that new path won't have the history built up from
> the previous frames (that were sent through the common path), so the
> first few frames encoded by that translator will produce less than
> optimal results (at best).
Right. app_conference breaks the rule that we must always transmit
encoded audio to participants which come from the same encoder. In
practice, with all of the codecs we've done this with (which is really
just GSM and speex that I've done this extensively with, but it should
be fine with others), this doesn't lead to any noticable negative
effects. This is because in practice:
1) This generally happens when both encoder states are starting from a
"silence" state, so they're pretty much the same, or
2) It's happening when there's multiple speakers speaking at the same
time, in which case with all the crosstalk, you don't really notice
anything.
All VoIP codecs need to have the ability to deal with the loss of state
synchronization, because they all need to deal with packet loss,
starting in the middle of a session, etc. So, codec authors kinda
cringe, saying "you can't do that", but, in practice, it works really
well. The codec authors might say, you have two options to do this:
1) Send all the speakers encoded streams to each client, and let them
decode/mix. (this uses a lot of bandwidth)
2) Decode all the speakers, recode, and send that out to all the users
(the way meetme works). This uses a lot of CPU, (it's O(n^2)), and also
means that you always suffer from generational loss, by
decoding/recoding everything. (the difference is noticable).
-SteveK
More information about the asterisk-dev
mailing list