[asterisk-dev] Wish: adding intelligent codec negotiation to asterisk / pjsip

Mon Jan 30 17:19:23 CST 2017

On Mon, Jan 30, 2017 at 4:44 PM, George Joseph <gjoseph at digium.com> wrote:

>
>
> On Mon, Jan 30, 2017 at 3:32 PM, Matthew Jordan <mjordan at digium.com>
> wrote:
>
>>
>>
>> On Mon, Jan 30, 2017 at 3:22 PM, Matt Fredrickson <creslin at digium.com>
>> wrote:
>>
>>> Hey Michael,
>>>
>>> First off, thanks for taking the time to express some of your thoughts
>>> and concerns to the asterisk-dev list.  I'll keep my reply to your
>>> email inline below.
>>>
>>> On Mon, Jan 30, 2017 at 4:13 AM, Michael Maier <m1278468 at allmail.net>
>>> wrote:
>>> > Dear developers,
>>> >
>>> > I've been redirected to this mailing list by Joshua Colp during fixing
>>> a
>>> > one way audio bug[1] to discuss another solution as provided in the
>>> fix.
>>> >
>>> > Background:
>>> > - A lot of people complain about bad VoIP call quality compared to the
>>> > old POTS / ISDN devices. What do they mean from a technical point of
>>> > view: High latencies (resulting in echo), digital sound because of
>>> "bad"
>>> > codecs, general quality loss during transcoding and many other reasons
>>> more.
>>> > - In Europe, HD audio is being adopted slowly. This means, more and
>>> more
>>> > UAs can natively handle HD codecs like g722. But they must be downward
>>> > compatible at the same time for older UAs, which just speak alaw (like
>>> > the old POTS devices e.g. or UAs which are not yet HD capable).
>>> > Therefore, they advertise at least two codecs: g722 and alaw (mostly
>>> > plus some more like ulaw or some other codecs).
>>>
>>> So there are multiple reasons why you could be seeing reportedly bad
>>> call quality that come to my mind:
>>>
>>> 1. Transcoding - changes the audio, but typically doesn't make things
>>> sound *too* bad.  Obviously it is codec dependent as to how bad it
>>> sounds afterwards, but most modern codecs aren't terrible for speech
>>> replication and encoding.  Usually this is not where call quality
>>> problems are noticed.
>>>
>>> 2. Packet loss and jitter related problems.  In an ISDN network, there
>>> is a guaranteed real time audio channel for transporting media.  As
>>> long as the data pumps on the transmit and receive side are working
>>> properly, you should hear almost no audio quality issues.  VoIP tries
>>> to transport real time audio over a non-guaranteed transport channel.
>>> This sometimes causes bad audio quality issues due to packet loss,
>>> packet reordering, or extreme packet delays.  Enabling Asterisk's
>>> jitter buffers typically improves many problems that arise due to
>>> this.  They are typically *not* enabled by default and so must be
>>> explicitly enabled.
>>>
>>> I'm hoping you already have dived into your problem to look at both
>>> the above elements, and have confirmed that you are not dealing with
>>> the second problem instead of voice mutation due to the first problem.
>>> Usually you can track the second problem by doing packet captures of
>>> the voice conversations in question as well as look at RTCP
>>> statistics.
>>>
>>> > What does this mean to Asterisk?
>>> > My conviction is, that Asterisk shouldn't make things even worse when
>>> > handling calls / codecs by forcing unnecessary transcoding, which
>>> > unnecessarily harms call quality. Next point of unnecessary
>>> transcoding:
>>> > it unnecessarily steals system resources from the machine asterisk is
>>> > running on.
>>> > Asterisk should harm each call it handles and the underlying machine as
>>> > little as possible.
>>> >
>>> > Therefore I would like to see a (switchable) feature, that asterisk /
>>> > pjsip always tries to primarily advertise codecs, which are supported
>>> by
>>> > both UAs and remove those codecs, which are not supported by one of the
>>> > UAs. This prevents unnecessary transcoding.
>>>
>>> This actually would be a really neat thing for Asterisk to be able to
>>> do.  Last time I looked at it, there are quite a few challenges in
>>> making it happen.  Asterisk is designed to be a back to back user
>>> agent, and it inherently is designed to terminate media and codecs
>>> individually with each leg in question, but not necessarily together.
>>> It "makes things work" on each leg separately, based on the allowed
>>> codecs for each endpoint.
>>>
>>> This is a needful behavior since many times an Answer() has already
>>> occurred and negotiated the codec capabilities for a call and most
>>> dial plan applications assume a call needs to have media fully
>>> negotiated in order to interact on the channel.
>>>
>>> For the simple case where your dial plan doesn't do any intense media
>>> interaction with a channel and simply Dial()'s out, a significant
>>> portion that doesn't work right now is that the codec information from
>>> the 200 OK received from the outbound channel is not passed back
>>> through to the inbound channel - I'm assuming that's what you're
>>> referring to.  Hopefully Josh or Mark will correct me if my memory is
>>> off.
>>>
>>>
>> We've looked at doing this several times.
>>
>> The last time I gave that a shot I more or less had it working, and then
>> broke myself (and my patch) on the rocks of Local channels. Trying to get
>> Local channel chains to respect the codecs down a chain (generally 3+ is
>> enough to trip errors) is a non-trivial operation. When setting up the call
>> you more or less can control the situation - the codec preferences are
>> passed down the chain via ast_request. However, when things are coming back
>> in the other direction, you don't have an easy reference to the channel
>> that created you. It can involve reaching across bridges and local channel
>> bridges, which is dangerous. Both are prone to error and deadlocks. Holding
>> a reference to your creator isn't safe, as the creator can be masqueraded
>> away (in really weird situations). The other option is to pass the
>> information down in a frame, but now you can't guarantee much, as that's an
>> asynchronous operation.
>>
>> In transcoding situations, you also have to pick where in that chain you
>> choose to do transcoding - which is not always easy to figure out.
>>
>> Things get challenging as well when you have a multi-party bridge
>> involved at all - either when a dialed party has to be placed into a
>> multi-party bridge or when you have a chain of Local channels dialing
>> someone and the far end of the Local channel chain is already in a
>> multi-party bridge.
>>
>> Generally, this is an architectural limitation of Asterisk that - while
>> not impossible to craft a solution for - is far harder than it may seem
>> when you're only thinking of two-party calling with non-Local channels.
>>
>
> How hard do you think it would it be to limit the attempt to the 90% use
> case of a call coming in on one channel and causing an originate going out
> on another channel?  Simple PBX scenarios?  No answer in the middle, no
> early media, etc.
>
>
A lot of the code in the core or in specific channel drivers - which is
where this would need to be located - tries hard to not make assumption
about the number of participants in a bridge or the types of channels it is
working with.

I think you'd end up with a lot of junk lying around doing things like:

if (number of participants in bridge is 2) {
   // do cool things
} else {
   // behave normally
}

Or worse:

if (I am not bridged with a Local channel and I'm not in a bridge with more
than one other channel) {
  // do cool things
}

Which just gets ugly and hard(er) to maintain.

I could be wrong about the scope of this, but that's more or less what I
found when I played around with a similar notion.

-- 
Matthew Jordan
Digium, Inc. | CTO
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at: http://digium.com & http://asterisk.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20170130/010f5c49/attachment.html>