[asterisk-dev] Possible Deadlock?

Mark Michelson mmichelson at digium.com
Fri Aug 22 11:31:17 CDT 2008


Norman Franke wrote:
> I had a deadlock today using 1.4.21.1. I had asterisk dump core to  
> explore in GDB. Thread 32 is waiting for the "channels" lock held by  
> 21 while holding a channel lock. Thread 21 is then waiting for a lock  
> on a channel held by 32 while holding a lock on "channels".
> 
> However, I can't figure out how thread 32 got the lock in the first  
> place. Perhaps from this in the log?
> 
> [Aug 19 10:55:26] VERBOSE[25324] logger.c:     -- Got SIP response 480  
> "Temporarily Unavailable" back from 172.16.22.30
> 
> And it never released the lock? I see in chan_sip.c:12936
> 
>                                  /* XXX Locking issues?? XXX */
> 
> This has only happened once recently.
> 
> Thread 32 (process 25324):
> #0  0xffffe410 in __kernel_vsyscall ()
> #1  0xb7edd56e in __lll_mutex_lock_wait () from /lib/tls/i686/cmov/ 
> libpthread.so.0
> #2  0xb7eda179 in _L_mutex_lock_141 () from /lib/tls/i686/cmov/ 
> libpthread.so.0
> #3  0xb786a918 in ?? ()
> #4  0x0807d514 in ast_cdr_start (cdr=0x815a148) at cdr.c:689
> #5  0x0807fd3f in ast_mutex_lock (pmutex=0x815a148) at /usr/src/ 
> asterisk-1.4.21.1/include/asterisk/lock.h:755
> #6  0x08081795 in ast_channel_alloc (needqueue=0, state=0,  
> cid_num=0xb7358a4d "6106421787", cid_name=0x8150810 "",
>      acctcode=0x8150810 "", exten=0xb7358a05 "8827",  
> context=0xb7358996 "cisco", amaflag=0, name_fmt=0xb78c43d4 "SIP/%s- 
> %08x")
>      at channel.c:858
> #7  0xb787dc90 in sip_new (i=0xb7366560, state=0, title=0x0) at  
> chan_sip.c:4012
> #8  0xb78ae300 in handle_request_invite (p=0xb7366560, req=0xb786b11c,  
> debug=0, seqno=101, sin=0xb786b10c, recount=0xb786b100,
>      e=0xb786b33f "sip:8827 at 172.16.8.14:5060", nounlock=0xb786b104) at  
> chan_sip.c:14052
> #9  0xb78b422b in handle_request (p=0xb7366560, req=0xb786b11c,  
> sin=0xb786b10c, recount=0xb786b100, nounlock=0xb786b104)
>      at chan_sip.c:15413
> #10 0xb78b4cae in sipsock_read (id=0x8285778, fd=11, events=1,  
> ignore=0x0) at chan_sip.c:15566
> #11 0x080ae93d in ast_io_wait (ioc=0x82777e0, howlong=241) at io.c:279
> #12 0xb78b5660 in do_monitor (data=0x0) at chan_sip.c:15779
> #13 0x08101bc9 in dummy_start (data=0x82828b0) at utils.c:895
> #14 0xb7ed8240 in start_thread () from /lib/tls/i686/cmov/ 
> libpthread.so.0
> #15 0xb7df349e in clone () from /lib/tls/i686/cmov/libc.so.6
> 
> Thread 21 (process 31233):
> #0  0xffffe410 in __kernel_vsyscall ()
> #1  0xb7edd56e in __lll_mutex_lock_wait () from /lib/tls/i686/cmov/ 
> libpthread.so.0
> #2  0xb7eda179 in _L_mutex_lock_141 () from /lib/tls/i686/cmov/ 
> libpthread.so.0
> #3  0xb6c29f78 in ?? ()
> #4  0x0807275e in ast_mutex_unlock (pmutex=0x82e4ea8) at /usr/src/ 
> asterisk-1.4.21.1/include/asterisk/lock.h:745
> #5  0x0807fd3f in ast_mutex_lock (pmutex=0x82e4ea8) at /usr/src/ 
> asterisk-1.4.21.1/include/asterisk/lock.h:755
> #6  0x080824a2 in ast_channel_free (chan=0x82e4e30) at channel.c:1198
> #7  0x080831db in ast_hangup (chan=0x82e4e30) at channel.c:1502
> #8  0xb71d3c25 in dial_exec_full (chan=0xb6f2b860, data=0xb6c2d008,  
> peerflags=0xb6c2aed4, continue_exec=0x0) at app_dial.c:1719
> #9  0xb71d3e50 in dial_exec (chan=0xb6f2b860, data=0xb6c2d008) at  
> app_dial.c:1753
> #10 0x080bfeab in pbx_exec (c=0xb6f2b860, app=0x82a78a8,  
> data=0xb6c2d008) at pbx.c:537
> #11 0x080c32f3 in pbx_extension_helper (c=0xb6f2b860, con=0x0,  
> context=0xb6f2b9e0 "smcx", exten=0xb6f2ba30 "3911", priority=11,
>      label=0x0, callerid=0xb6f3e1b8 "7325890000", action=E_SPAWN) at  
> pbx.c:1862
> #12 0x080c43ed in ast_spawn_extension (c=0xb6f2b860,  
> context=0xb6f2b9e0 "smcx", exten=0xb6f2ba30 "3911", priority=11,
>      callerid=0xb6f3e1b8 "7325890000") at pbx.c:2317
> #13 0x080c4909 in __ast_pbx_run (c=0xb6f2b860) at pbx.c:2419
> #14 0x080c56d8 in pbx_thread (data=0xb6f2b860) at pbx.c:2636
> #15 0x08101bc9 in dummy_start (data=0xb6f00ac0) at utils.c:895
> #16 0xb7ed8240 in start_thread () from /lib/tls/i686/cmov/ 
> libpthread.so.0
> #17 0xb7df349e in clone () from /lib/tls/i686/cmov/libc.so.6
> 
> Norman Franke
> Answering Service for Directors, Inc.
> www.myasd.com
> 
> 
> 
> _______________________________________________
> --Bandwidth and Colocation Provided by http://www.api-digital.com--
> 
> AstriCon 2008 - September 22 - 25 Phoenix, Arizona
> Register Now: http://www.astricon.net
> 
> asterisk-dev mailing list
> To UNSUBSCRIBE or update options visit:
>    http://lists.digium.com/mailman/listinfo/asterisk-dev

Thanks for the report. There are a few peculiarities with the backtrace 
provided. For instance, the trace of thread 21 is not possible since 
ast_mutex_lock does not call ast_mutex_unlock.But I think that's just a fluke 
here because the problem seems legitimate to me.

  To answer your question about how and where thread 32 locks a channel, that's 
done inside of sipsock_read in chan_sip, line 15537.

The 480 response being received does not appear to have anything to do with the 
deadlock, but rather an unsafe locking order used in chan_sip. The proper order, 
if a channel and the channel list lock are needed is to lock the list and then 
the channel, not the other way around.

I would suggest opening a bug report for this on bugs.digium.com so that this 
doesn't end up getting lost in the mail, so to speak. I don't suspect that this 
issue is going to be a particularly easy or quick one to fix, but it is important.

Mark Michelson



More information about the asterisk-dev mailing list