[asterisk-bugs] [Asterisk 0012269]: Deadlock after Originate from AMI to Agent

noreply at bugs.digium.com noreply at bugs.digium.com
Fri Jul 11 09:32:31 CDT 2008


A NOTE has been added to this issue. 
====================================================================== 
http://bugs.digium.com/view.php?id=12269 
====================================================================== 
Reported By:                IgorG
Assigned To:                
====================================================================== 
Project:                    Asterisk
Issue ID:                   12269
Category:                   Channels/chan_agent
Reproducibility:            always
Severity:                   major
Priority:                   normal
Status:                     new
Asterisk Version:           SVN 
SVN Branch (only for SVN checkouts, not tarball releases):  trunk 
SVN Revision (number only!): 110444 
Disclaimer on File?:        N/A 
Request Review:              
====================================================================== 
Date Submitted:             03-21-2008 03:15 CDT
Last Modified:              07-11-2008 09:32 CDT
====================================================================== 
Summary:                    Deadlock after Originate from AMI to Agent
Description: 
I have discovered deadlock while using chan_agent and originate cal via
AMI. To reproduce need one agent and one registered phone. 

1) Login via AMI and make originate call
2) Answer a call both sides, talk and hangup phone.
3) After hangup MoH for agent doesn't starts and CLI show ERRORs:

    -- Started music on hold, class 'default', on SIP/104-08362618
[Mar 21 13:35:21] ERROR[8145]:
/usr/src/voip/asterisk-trunk/asterisk-trunk.patched-cng/include/aster:461
__ast_pthread_mutex_unlock: chan_agent.c line 843 (agent_hangup): attempted
unlock mutex '&p->app_lock' without owning it!
[Mar 21 13:35:21] ERROR[8145]:
/usr/src/voip/asterisk-trunk/asterisk-trunk.patched-cng/include/aster:463
__ast_pthread_mutex_unlock: chan_agent.c line 979 (agent_new):
'&p->app_lock' was locked here.
[Mar 21 13:35:21] ERROR[8145]:
/usr/src/voip/asterisk-trunk/asterisk-trunk.patched-cng/include/aster:486
__ast_pthread_mutex_unlock: chan_agent.c line 843 (agent_hangup): Error
releasing mutex: Operation not permitted

3) 'agent show online' still show called side in list

server-voip*CLI> agent show online
1001         (Vasya Pupkin) logged in on SIP/104-08362618 is idle
(musiconhold is 'default')
1 agents online>

3) After second try to Originage, originate fail and using CLI command
'agent show online' fail and make CLI frozen
====================================================================== 

---------------------------------------------------------------------- 
 gknispel_proformatique - 07-11-08 09:32  
---------------------------------------------------------------------- 
We have hit this problem too and traced it some time ago.

The problem is that the Originate command launch an incredible number of
threads just to start the new call (maybe 3 or 4, i don't remember
clearly). In one of this thread a callback to chan_agent is done. IIRC an
other callback to chan_agent is done latter in another thread.

Now the thing is Asterisk uses reentrant locks. The other thing is that
synchronisation in chan_agent is performed in a very "imaginative" way
taking and releasing locks in a loop in a separated thread in chan_agent
that provides something like moh, and that is temporarily stopped when a
call is propagated (so that the thread of this call can send its audio to
the bridged agent instead of the moh audio).

The result of all of the above is that, when using the AMI Originate
command, a lock is taken in one thread, that is later tried to be released
by an other thread. I'm not sure if this could work with non reentrant
lock, but what i'm sure about is that this can't with reentrant ones :p

This situation can not happen with a Dial because calls performed by a
Dial do not follow the same path as call performed by an Originate, there
are less threads involved and the lock that is taken and later released is
correctly managed in the same thread.

Now about fixing this issue;
In our case we did not know why at the begining there are so many threads
involved in the execution of an Originate command, nor why synchronisation
in chan_agent is written like that. There might me good reasons. Also we
needed to solve the issue quickly with a guarantee that we would not
introduce any regression, as a client was involved. So we did not took the
risk to modify the chan_agent implementation nor the Originate
implementation, but instead rewrote or own simplified version of Originate
that do not start so many threads, and that is used just for this purpose
of letting agents dial out with their agent identity.

I don't know all the Asterisk code pathes well enough to think about how
this could be fixed correctly upstream. Maybe the number of thread involed
in Originate can be reduced, but this might cause other problems with other
channels. Maybe somebody who know chan_agent well enough can rewrite it
correctly so it does not abuse locks so much.

FYI you can take a look at our alternate originate implementation. The
problem is that it was for Asterisk 1.2. Also it does not support all the
options that Originate does. It also abuses Asterisk 1.2 internals :/

http://www.proformatique.com/asterisk/ami_aoriginate.c 

Issue History 
Date Modified   Username       Field                    Change               
====================================================================== 
07-11-08 09:32  gknispel_proformatiqueNote Added: 0090080                       
  
======================================================================




More information about the asterisk-bugs mailing list