[asterisk-dev] Suspected deadlocks in Asterisk 1.8 under heavy load
Kevin P. Fleming
kpfleming at digium.com
Wed Aug 17 06:40:40 CDT 2011
On 08/17/2011 04:03 AM, Tony Mountifield wrote:
> In article<4E4B2634.6090602 at digium.com>,
> Kevin P. Fleming<kpfleming at digium.com> wrote:
>> On 08/16/2011 06:42 PM, Alistair Cunningham wrote:
>>> Just to let everyone know, we strongly suspect that Asterisk 1.8.4.4 and
>>> 1.8.5.0 suffer from occasional deadlocks under heavy load, possibly
>>> related to the local channel. The symptoms seem to vary, but include:
>>>
>>> 1. Asterisk stops responding to SIP packets, but the Asterisk console
>>> and manager interface remain responsive. A "sipsak -s sip:127.0.0.1:5060
>>> -d" times out.
>>
>> Matt Nicholson committed a change to the 1.8, 10 and trunk branches
>> today to solve a significant performance issue caused by the change to
>> chan_sip to return the SIP hangup cause to the 'master' channel. His
>> change made that behavior optional, even though it was already released
>> in 1.8, because of the performance impact it has. We had another
>> customer report a similar set of symptoms.
>
> Surely something that just consumes a significant amount of CPU shouldn't
> cause a deadlock directly? A deadlock indicates a logic error. Of course,
> the extra load might make the window longer for a potential deadlock to occur,
> but this might even aid in reproducing it and tracking it down...
As Alistair already replied, it's not a true 'deadlock' except in
extreme cases. The performance issue is caused by the fact that each
outbound SIP channel, when it is being hung up, has to find its 'master'
channel (using the MASTER_CHANNEL dialplan function) in order to update
the hash stored on that channel. I have not personally reviewed the
implementation of this function, and it's completely conceivable it
could be done more efficiently, but as it stands today, using this
function requires inspection of the global channel list. This requires
locking and unlocking of the global channel list, so on a busy system
where channels are being constantly created and destroyed, this can lead
to significant delays in processing SIP traffic.
--
Kevin P. Fleming
Digium, Inc. | Director of Software Technologies
Jabber: kfleming at digium.com | SIP: kpfleming at digium.com | Skype: kpfleming
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at www.digium.com & www.asterisk.org
More information about the asterisk-dev
mailing list