[asterisk-bugs] [Asterisk 0014639]: Asterisk CPU usage 100% (deadlock?) when using Queue() with Local channels

Asterisk Bug Tracker noreply at bugs.digium.com
Fri Jul 24 09:08:29 CDT 2009


A NOTE has been added to this issue. 
====================================================================== 
https://issues.asterisk.org/view.php?id=14639 
====================================================================== 
Reported By:                coolmig
Assigned To:                
====================================================================== 
Project:                    Asterisk
Issue ID:                   14639
Category:                   Channels/chan_local
Reproducibility:            always
Severity:                   minor
Priority:                   normal
Status:                     new
Target Version:             1.4.28
Asterisk Version:           1.4.22 
Regression:                 No 
SVN Branch (only for SVN checkouts, not tarball releases): N/A 
SVN Revision (number only!):  
Request Review:              
====================================================================== 
Date Submitted:             2009-03-10 12:26 CDT
Last Modified:              2009-07-24 09:08 CDT
====================================================================== 
Summary:                    Asterisk CPU usage 100% (deadlock?) when using
Queue() with Local channels
Description: 
I was testing a dialer I'm making, and found this situation: when I dial
more than one call to an agent using Local channels (busy with one call and
one or more in the queue), Asterisk goes 100% CPU all the time until the
queued calls get answered or hangup.

Core show locks shows this:

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 3077847952 (pbx_thread           started at [ 2645] pbx.c
ast_pbx_start())
=== ---> Waiting for Lock https://issues.asterisk.org/view.php?id=0
(chan_agent.c): MUTEX 2252 __login_exec
&p->app_lock 0x93d59d4 (1)
=== --- ---> Locked Here: chan_agent.c line 1080 (agent_new)
=== -------------------------------------------------------------------
===
=== Thread ID: 3073665936 (pbx_thread           started at [ 2645] pbx.c
ast_pbx_start())
=== ---> Lock https://issues.asterisk.org/view.php?id=0 (chan_agent.c): MUTEX
1080 agent_new &p->app_lock
0x93d59d4 (1)
=== -------------------------------------------------------------------
===
=== Thread ID: 3075632016 (pbx_thread           started at [ 2645] pbx.c
ast_pbx_start())
=== ---> Tried and failed to get Lock https://issues.asterisk.org/view.php?id=0
(chan_local.c): MUTEX 186
local_queue_frame (channel lock) 0x9464470 (0)
=== -------------------------------------------------------------------
===
=======================================================================

If I hangup the initial agent call, if takes a long time for the Queue to
deliver the other call to the agent, while the agent is with no call, core
show locks shows this:

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 3077847952 (pbx_thread           started at [ 2645] pbx.c
ast_pbx_start())
=== ---> Lock https://issues.asterisk.org/view.php?id=0 (chan_agent.c): MUTEX
2252 __login_exec &p->app_lock
0x93d59d4 (1)
=== -------------------------------------------------------------------
===
=== Thread ID: 3076860816 (pbx_thread           started at [ 2645] pbx.c
ast_pbx_start())
=== ---> Tried and failed to get Lock https://issues.asterisk.org/view.php?id=0
(chan_local.c): MUTEX 186
local_queue_frame (channel lock) 0x945d820 (0)
=== -------------------------------------------------------------------
===
=======================================================================

When the queued call finally is delivered to the agent, core show locks
shows this, and the 100% usage stops:

=======================================================================
=== Currently Held Locks ==============================================
=======================================================================
===
=== <file> <line num> <function> <lock name> <lock addr> (times locked)
===
=== Thread ID: 3077847952 (pbx_thread           started at [ 2645] pbx.c
ast_pbx_start())
=== ---> Waiting for Lock https://issues.asterisk.org/view.php?id=0
(chan_agent.c): MUTEX 2252 __login_exec
&p->app_lock 0x93d59d4 (1)
=== --- ---> Locked Here: chan_agent.c line 1080 (agent_new)
=== -------------------------------------------------------------------
===
=== Thread ID: 3075632016 (pbx_thread           started at [ 2645] pbx.c
ast_pbx_start())
=== ---> Lock https://issues.asterisk.org/view.php?id=0 (chan_agent.c): MUTEX
1080 agent_new &p->app_lock
0x93d59d4 (1)
=== -------------------------------------------------------------------
===
=======================================================================
====================================================================== 

---------------------------------------------------------------------- 
 (0108169) pscs (reporter) - 2009-07-24 09:08
 https://issues.asterisk.org/view.php?id=14639#c108169 
---------------------------------------------------------------------- 
I have a similar problem with Asterisk 1.6.0.9-samy-r27
(This setup is on a Intel quad core 2.33GHz, 4GB RAM, with about 12
extensions defined)

In my case, Queues simply don't work if there is more than one static
member with the 'ringall' type set. If 'roundrobin' is set, then they work
fine (but not as we want).

After investigation it seems as if the dialparties.agi script is failing
because the 'ExtensionState' API is taking too long to return.

I hacked dialparties.agi to put some debug logging around the
ExtensionState call in the 'is_ext_available' function. It logs 'iea: 0
<extnum>' just before it calls ExtensionState and 'iea: 1 <extnum>' just
after it.

Then, see the log extract below. You can see 'iea: 0 807' at 14:29:58 just
before the state of extension 807 is checked. Then, just after that you see
'iea: 0 808' as the state of extension 808 is checked.
Then, 13 seconds later, you see 'iea: 1 807' as the state of 807 is
returned (idle). But, you don't see the state of extension 808 being
returned.
Shortly after that you see 'Nobody picked up in 15000 ms' then the
dialparties.agi scripts abort/are aborted, and return -1. Neither extension
807 or 808 are called.

If you have a queue with just one member, then the ExtensionState still
takes a long time, but since it is only called once, it finishes just in
time (after a long delay), so the queue works, but with only one member.

The queue 'timeout' here is set to 15 seconds. If I increase that to 60
seconds, then, eventually, the phones do ring, but only after about 20+
seconds of being in the queue.

Also, if the queue is on roundrobin, then each agent's phone actually only
rings for about 2 or 3 seconds if the timeout is set to 15 seconds, as most
of the timeout is being spent in the ExtensionState API.

[Jul 24 14:29:58] VERBOSE[4226] logger.c:  dialparties.agi: Caller ID name
is '111Paul Smith' number is '804'
[Jul 24 14:29:58] VERBOSE[4225] logger.c:  dialparties.agi: Methodology of
ring is  'none'
[Jul 24 14:29:58] VERBOSE[4225] logger.c:     -- dialparties.agi: Added
extension 807 to extension map
[Jul 24 14:29:58] VERBOSE[4225] logger.c:     -- dialparties.agi:
Extension 807 cf is disabled
[Jul 24 14:29:58] VERBOSE[4225] logger.c:     -- dialparties.agi:
Extension 807 do not disturb is disabled
[Jul 24 14:29:58] VERBOSE[4225] logger.c:        > dialparties.agi: extnum
807 has:  cw: 1; hascfb: 0 [] hascfu: 0 []
[Jul 24 14:29:58] VERBOSE[4225] logger.c:        > dialparties.agi: MM: 0
[Jul 24 14:29:58] VERBOSE[4225] logger.c:        > dialparties.agi: iea: 0
807
[Jul 24 14:29:58] VERBOSE[4226] logger.c:        > dialparties.agi:
USE_CONFIRMATION:  'FALSE'
[Jul 24 14:29:58] VERBOSE[4226] logger.c:        > dialparties.agi:
RINGGROUP_INDEX:   ''
[Jul 24 14:29:58] VERBOSE[4226] logger.c:  dialparties.agi: Methodology of
ring is  'none'
[Jul 24 14:29:58] VERBOSE[4226] logger.c:     -- dialparties.agi: Added
extension 808 to extension map
[Jul 24 14:29:58] VERBOSE[4226] logger.c:     -- dialparties.agi:
Extension 808 cf is disabled
[Jul 24 14:29:58] VERBOSE[4226] logger.c:     -- dialparties.agi:
Extension 808 do not disturb is disabled
[Jul 24 14:29:58] VERBOSE[4226] logger.c:        > dialparties.agi: extnum
808 has:  cw: 1; hascfb: 0 [] hascfu: 0 []
[Jul 24 14:29:58] VERBOSE[4226] logger.c:        > dialparties.agi: MM: 0
[Jul 24 14:29:58] VERBOSE[4226] logger.c:        > dialparties.agi: iea: 0
808
[Jul 24 14:30:11] VERBOSE[4225] logger.c:        > dialparties.agi: iea: 1
807
[Jul 24 14:30:11] VERBOSE[4225] logger.c:        > dialparties.agi:
ExtensionState: 0
[Jul 24 14:30:11] VERBOSE[4225] logger.c:     -- dialparties.agi: dbset
CALLTRACE/807 to 804
[Jul 24 14:30:11] VERBOSE[4225] logger.c:     -- dialparties.agi: Filtered
ARG3: 807
[Jul 24 14:30:11] VERBOSE[4225] logger.c:        > dialparties.agi:
NODEST: 200 blkvm enabled macro already in dialopts: trM(auto-blkvm)
[Jul 24 14:30:11] VERBOSE[4225] logger.c:        > dialparties.agi:
NODEST: 200 blkvm enabled macro already in dialopts: trM(auto-blkvm)
[Jul 24 14:30:11] VERBOSE[4225] logger.c:        > dialparties.agi: DS:
SIP/807
[Jul 24 14:30:11] VERBOSE[4231] logger.c:   == Manager 'admin' logged off
from 127.0.0.1
[Jul 24 14:30:13] VERBOSE[4224] logger.c:     -- Nobody picked up in 15000
ms
[Jul 24 14:30:13] VERBOSE[4224] logger.c:     -- Nobody picked up in 15000
ms
[Jul 24 14:30:13] VERBOSE[4225] logger.c:     --
<Local/807 at from-internal-1e89;2>AGI Script dialparties.agi completed,
returning -1
[Jul 24 14:30:13] VERBOSE[4232] logger.c:   == Manager 'admin' logged off
from 127.0.0.1
[Jul 24 14:30:13] VERBOSE[4226] logger.c:     --
<Local/808 at from-internal-2f12;2>AGI Script dialparties.agi completed,
returning -1

Ring Groups don't seem to have the problem because they don't seem to call
the ExtensionState API.

This would also be why the SIP/num members don't have a problem, as I
presume that ExtensionState isn't called for those either. 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2009-07-24 09:08 pscs           Note Added: 0108169                          
======================================================================




More information about the asterisk-bugs mailing list