[asterisk-dev] Wish: adding intelligent codec negotiation to asterisk / pjsip

Tue Jan 31 02:22:42 CST 2017

Hello Matthew!

On 01/31/2017 at 12:19 AM Matthew Jordan wrote:
> On Mon, Jan 30, 2017 at 4:44 PM, George Joseph <gjoseph at digium.com> wrote:
> 
>>
>>
>> On Mon, Jan 30, 2017 at 3:32 PM, Matthew Jordan <mjordan at digium.com>
>> wrote:
>>
>>>
>>>
>>> On Mon, Jan 30, 2017 at 3:22 PM, Matt Fredrickson <creslin at digium.com>
>>> wrote:
>>>
>>>> Hey Michael,
>>>>
>>>> First off, thanks for taking the time to express some of your thoughts
>>>> and concerns to the asterisk-dev list.  I'll keep my reply to your
>>>> email inline below.
>>>>
>>>> On Mon, Jan 30, 2017 at 4:13 AM, Michael Maier <m1278468 at allmail.net>
>>>> wrote:
>>>>> Dear developers,
>>>>>
>>>>> I've been redirected to this mailing list by Joshua Colp during fixing
>>>> a
>>>>> one way audio bug[1] to discuss another solution as provided in the
>>>> fix.
>>>>>
>>>>> Background:
>>>>> - A lot of people complain about bad VoIP call quality compared to the
>>>>> old POTS / ISDN devices. What do they mean from a technical point of
>>>>> view: High latencies (resulting in echo), digital sound because of
>>>> "bad"
>>>>> codecs, general quality loss during transcoding and many other reasons
>>>> more.
>>>>> - In Europe, HD audio is being adopted slowly. This means, more and
>>>> more
>>>>> UAs can natively handle HD codecs like g722. But they must be downward
>>>>> compatible at the same time for older UAs, which just speak alaw (like
>>>>> the old POTS devices e.g. or UAs which are not yet HD capable).
>>>>> Therefore, they advertise at least two codecs: g722 and alaw (mostly
>>>>> plus some more like ulaw or some other codecs).
>>>>
>>>> So there are multiple reasons why you could be seeing reportedly bad
>>>> call quality that come to my mind:
>>>>
>>>> 1. Transcoding - changes the audio, but typically doesn't make things
>>>> sound *too* bad.  Obviously it is codec dependent as to how bad it
>>>> sounds afterwards, but most modern codecs aren't terrible for speech
>>>> replication and encoding.  Usually this is not where call quality
>>>> problems are noticed.
>>>>
>>>> 2. Packet loss and jitter related problems.  In an ISDN network, there
>>>> is a guaranteed real time audio channel for transporting media.  As
>>>> long as the data pumps on the transmit and receive side are working
>>>> properly, you should hear almost no audio quality issues.  VoIP tries
>>>> to transport real time audio over a non-guaranteed transport channel.
>>>> This sometimes causes bad audio quality issues due to packet loss,
>>>> packet reordering, or extreme packet delays.  Enabling Asterisk's
>>>> jitter buffers typically improves many problems that arise due to
>>>> this.  They are typically *not* enabled by default and so must be
>>>> explicitly enabled.
>>>>
>>>> I'm hoping you already have dived into your problem to look at both
>>>> the above elements, and have confirmed that you are not dealing with
>>>> the second problem instead of voice mutation due to the first problem.
>>>> Usually you can track the second problem by doing packet captures of
>>>> the voice conversations in question as well as look at RTCP
>>>> statistics.
>>>>
>>>>> What does this mean to Asterisk?
>>>>> My conviction is, that Asterisk shouldn't make things even worse when
>>>>> handling calls / codecs by forcing unnecessary transcoding, which
>>>>> unnecessarily harms call quality. Next point of unnecessary
>>>> transcoding:
>>>>> it unnecessarily steals system resources from the machine asterisk is
>>>>> running on.
>>>>> Asterisk should harm each call it handles and the underlying machine as
>>>>> little as possible.
>>>>>
>>>>> Therefore I would like to see a (switchable) feature, that asterisk /
>>>>> pjsip always tries to primarily advertise codecs, which are supported
>>>> by
>>>>> both UAs and remove those codecs, which are not supported by one of the
>>>>> UAs. This prevents unnecessary transcoding.
>>>>
>>>> This actually would be a really neat thing for Asterisk to be able to
>>>> do.  Last time I looked at it, there are quite a few challenges in
>>>> making it happen.  Asterisk is designed to be a back to back user
>>>> agent, and it inherently is designed to terminate media and codecs
>>>> individually with each leg in question, but not necessarily together.
>>>> It "makes things work" on each leg separately, based on the allowed
>>>> codecs for each endpoint.
>>>>
>>>> This is a needful behavior since many times an Answer() has already
>>>> occurred and negotiated the codec capabilities for a call and most
>>>> dial plan applications assume a call needs to have media fully
>>>> negotiated in order to interact on the channel.
>>>>
>>>> For the simple case where your dial plan doesn't do any intense media
>>>> interaction with a channel and simply Dial()'s out, a significant
>>>> portion that doesn't work right now is that the codec information from
>>>> the 200 OK received from the outbound channel is not passed back
>>>> through to the inbound channel - I'm assuming that's what you're
>>>> referring to.  Hopefully Josh or Mark will correct me if my memory is
>>>> off.
>>>>
>>>>
>>> We've looked at doing this several times.
>>>
>>> The last time I gave that a shot I more or less had it working, and then
>>> broke myself (and my patch) on the rocks of Local channels. Trying to get
>>> Local channel chains to respect the codecs down a chain (generally 3+ is
>>> enough to trip errors) is a non-trivial operation. When setting up the call
>>> you more or less can control the situation - the codec preferences are
>>> passed down the chain via ast_request. However, when things are coming back
>>> in the other direction, you don't have an easy reference to the channel
>>> that created you. It can involve reaching across bridges and local channel
>>> bridges, which is dangerous. Both are prone to error and deadlocks. Holding
>>> a reference to your creator isn't safe, as the creator can be masqueraded
>>> away (in really weird situations). The other option is to pass the
>>> information down in a frame, but now you can't guarantee much, as that's an
>>> asynchronous operation.
>>>
>>> In transcoding situations, you also have to pick where in that chain you
>>> choose to do transcoding - which is not always easy to figure out.
>>>
>>> Things get challenging as well when you have a multi-party bridge
>>> involved at all - either when a dialed party has to be placed into a
>>> multi-party bridge or when you have a chain of Local channels dialing
>>> someone and the far end of the Local channel chain is already in a
>>> multi-party bridge.

As I'm a FreePBX user, I fear things are not that easy here and at the
end, the 90% solution (see below) wouldn't work here at all :-(.

>>>
>>> Generally, this is an architectural limitation of Asterisk that - while
>>> not impossible to craft a solution for - is far harder than it may seem
>>> when you're only thinking of two-party calling with non-Local channels.
>>>
>>
>> How hard do you think it would it be to limit the attempt to the 90% use
>> case of a call coming in on one channel and causing an originate going out
>> on another channel?  Simple PBX scenarios?  No answer in the middle, no
>> early media, etc.

This would have been my idea, too. Do it where it is possible (means:
secure and with mostly little work), leave it as it is where it is
expensive or just unsure. Most of the calls I think, are 2 persons are
connected together.

>>
>>
> A lot of the code in the core or in specific channel drivers - which is
> where this would need to be located - tries hard to not make assumption
> about the number of participants in a bridge or the types of channels it is
> working with.
> 
> I think you'd end up with a lot of junk lying around doing things like:
> 
> if (number of participants in bridge is 2) {
>    // do cool things
> } else {
>    // behave normally
> }
> 
> Or worse:
> 
> if (I am not bridged with a Local channel and I'm not in a bridge with more
> than one other channel) {
>   // do cool things
> }
> 
> Which just gets ugly and hard(er) to maintain.
> 
> I could be wrong about the scope of this, but that's more or less what I
> found when I played around with a similar notion.

My first idea was to prevent transcoding already during call setup. But
now I understand, that this is not a trivial thing to do.

Would it be easier to do it *after* the call has been completely
established (peer has answered and both legs have been finally established)?

Start an additional job (if a new extension based option
(prevent_transocding e.g.) is set to true) which first checks, if
transcoding for this extension is active. If it is active, check, if the
target codec is part of the defined codecs for this extension. If yes,
send a reinvite to the extension to place the target codec as new codec.

I would shrink this feature to extensions. If the call is between two
extensions, do it always for the callers extension.

This sounds practicable to *me* (but I'm not an asterisk specialist :-))
as you already today have to be able to change codecs on base of a
reinvite during a running call.

Thanks,
Michael