[Asterisk-Dev] CVS-HEAD 08/13/2005 SIP DEADLOCKING

Sherwood McGowan madprofzero at yahoo.com
Wed Aug 24 01:31:46 MST 2005


Brian mentions the problems almost exactly as we see them occuring on our
servers. 
Sometimes it just locks up, partially stopping all traffic on SIP, but still
taking *SOME* commands...

Here's backtraces of the processes in question....

Starting with highest CPU usage
Process Num: 32069
running info thread, thread apply all bt, if no output, just using bt
(gdb) bt
#0  0x0011a094 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0
#1  0x00119a28 in __pthread_wait_for_restart_signal ()
   from /lib/i686/libpthread.so.0
#2  0x0011b468 in __pthread_lock () from /lib/i686/libpthread.so.0
#3  0x001182d6 in pthread_mutex_lock () from /lib/i686/libpthread.so.0
#4  0x003942c2 in find_peer (peer=0xb4c2a0 "17048372334", sin=0x0,
realtime=1)
    at chan_sip.c:1546
#5  0x003945c3 in create_addr (r=0xb745c7a0, opeer=0xb4dd30 "17048372334")
    at chan_sip.c:1655
#6  0x003a635b in sip_send_mwi_to_peer (peer=0xb7422ad8) at chan_sip.c:9720
#7  0x0039e1f7 in do_monitor (data=0x0) at chan_sip.c:9851
#8  0x00117e21 in pthread_start_thread () from /lib/i686/libpthread.so.0
#9  0x004d8b1a in clone () from /lib/i686/libc.so.6

Then moving to second highest CPU usage:
Process Num: 24639
running info thread, thread apply all bt, if no output, just using bt
(gdb) bt
#0  0x004cfb7a in poll () from /lib/i686/libc.so.6
#1  0x0805ecde in ast_waitfor_nandfds (c=0x3582ea0, n=2, fds=0x0, nfds=0,
    exception=0x0, outfd=0x0, ms=0x3582e94) at channel.c:1259
#2  0x080662d5 in ast_generic_bridge (playitagain=0x3582f30,
playit=0x3582f34,
    start_time=0x3582f58, c0=0xb74dd688, c1=0x8566b80, config=0x3583600,
    fo=0x3582fe0, rc=0x3582fe4) at channel.c:1334
#3  0x0806378c in ast_channel_bridge (c0=0xb74dd688, c1=0x8566b80,
    config=0x3583600, fo=0x3582fe0, rc=0x3582fe4) at channel.c:3195
#4  0x007d1969 in ast_bridge_call (chan=0xb74dd688, peer=0x8566b80,
    config=0x3583600) at res_features.c:1161
#5  0x0079cac9 in dial_exec_full (chan=0xb74dd688, data=0x3583600,
    peerflags=0x3583958) at app_dial.c:1335
#6  0x0079aa57 in dial_exec (chan=0xfffffffc, data=0xfffffffc)
    at app_dial.c:1375
#7  0x08088d4f in pbx_extension_helper (c=0xb74dd688, con=0xfffffffc,
    context=0xb74dd7d8 "incoming", exten=0xb74dd8cc "_+1NXXNXXXXXX",
    priority=30, label=0x0, callerid=0x83e3f30 "Dial", action=0) at
pbx.c:547
#8  0x08089954 in __ast_pbx_run (c=0xb74dd688) at pbx.c:2144
#9  0x0808a579 in pbx_thread (data=0xb74dd688) at pbx.c:2431
#10 0x00117e21 in pthread_start_thread () from /lib/i686/libpthread.so.0
#11 0x004d8b1a in clone () from /lib/i686/libc.so.6

Now just connecting to the process ID that is noted by asterisk in the CLI
as the main process nun:
running info thread, thread apply all bt, if no output, just using bt
Process Num: 15377
#0  0x0011d30b in read () from /lib/i686/libpthread.so.0
#1  0x00000000 in ?? ()

Now just connecting each process hoping for more information to be given:
(Process ID is number just before each block)

32062	No Output
32065
#0  0x0011d508 in accept () from /lib/i686/libpthread.so.0
#1  0x080aabfa in accept_thread (ignore=0x0) at manager.c:1345
#2  0x00117e21 in pthread_start_thread () from /lib/i686/libpthread.so.0
#3  0x004d8b1a in clone () from /lib/i686/libc.so.6

32066
#0  0x0011a094 in __pthread_sigsuspend () from /lib/i686/libpthread.so.0
#1  0x00119a28 in __pthread_wait_for_restart_signal ()
   from /lib/i686/libpthread.so.0
#2  0x0011b468 in __pthread_lock () from /lib/i686/libpthread.so.0
#3  0x001182d6 in pthread_mutex_lock () from /lib/i686/libpthread.so.0
#4  0x0039433b in find_peer (peer=0x574d00 "17048372334", sin=0x0,
realtime=1)
    at chan_sip.c:1551
#5  0x0038f0de in sip_devicestate (data=0x574e30) at chan_sip.c:10003
#6  0x080c0793 in ast_device_state (device=0xb741965c "SIP/17048372334")
    at devicestate.c:93
#7  0x080c0cd1 in do_changes (data=0x0) at devicestate.c:146
#8  0x00117e21 in pthread_start_thread () from /lib/i686/libpthread.so.0
#9  0x004d8b1a in clone () from /lib/i686/libc.so.6

32067
#0  0x004d1ef1 in select () from /lib/i686/libc.so.6
#1  0x002ff59c in ?? () from /usr/lib/asterisk/modules/chan_modem.so
#2  0x006cce50 in ?? ()
#3  0x00000000 in ?? ()

32068
#0  0x004d1ef1 in select () from /lib/i686/libc.so.6
#1  0x007d8bf4 in ?? () from /usr/lib/asterisk/modules/res_features.so
#2  0x00be6c28 in ?? ()
#3  0x00be6d30 in ?? ()
#4  0x00000000 in ?? ()

32070
#0  0x004cfb7a in poll () from /lib/i686/libc.so.6
#1  0x080544c4 in ast_io_wait (ioc=0x83cd5f8, howlong=-4) at io.c:259
#2  0x00f5246a in do_monitor (data=0x0) at chan_mgcp.c:3445
#3  0x00117e21 in pthread_start_thread () from /lib/i686/libpthread.so.0
#4  0x004d8b1a in clone () from /lib/i686/libc.so.6

32073
#0  0x004cfb7a in poll () from /lib/i686/libc.so.6
#1  0x080544c4 in ast_io_wait (ioc=0x83d4330, howlong=-4) at io.c:259
#2  0x00bfe89a in network_thread (ignore=0x0) at chan_iax2.c:7794
#3  0x00117e21 in pthread_start_thread () from /lib/i686/libpthread.so.0
#4  0x004d8b1a in clone () from /lib/i686/libc.so.6

32074
#0  0x004cfb7a in poll () from /lib/i686/libc.so.6
#1  0x080544c4 in ast_io_wait (ioc=0x83dff08, howlong=-4) at io.c:259
#2  0x0031d3ff in do_monitor (data=0x0) at chan_skinny.c:3022
#3  0x00117e21 in pthread_start_thread () from /lib/i686/libpthread.so.0
#4  0x004d8b1a in clone () from /lib/i686/libc.so.6

Stopping debugging as each time a process (child processes not the larger
cpu and CLi reported ones) is debugged it dies off and spawns a new one, I'd
be here forever following an infinite loop. 

Thanks for any information, especially since other users seem to be having
this problem.. I'm not much of a developer on the backend, I mainly stay on
this list for information about what's going on with development and the
occasional question.

Sherwood McGowan

->-----Original Message-----
->From: asterisk-dev-bounces at lists.digium.com 
->[mailto:asterisk-dev-bounces at lists.digium.com] On Behalf Of 
->Brian Capouch
->Sent: Wednesday, August 24, 2005 3:13 AM
->To: Asterisk Developers Mailing List
->Subject: Re: [Asterisk-Dev] CVS-HEAD 08/13/2005 SIP DEADLOCKING
->
->Olle E. Johansson wrote:
->> Sherwood McGowan wrote:
->> 
->>>I can supply debug output if needed, but mainly need to know if 
->>>there's been updates since 8/13 that are fixing the Sip 
->Deadlock problem.
->>> 
->> 
->> Which SIP deadlock problem? YOu have to tell us more.
->> 
->
->I don't know for sure that it's a *SIP* deadlock problem, but 
->there's definitely a deadlock problem in the last two CVS 
->versions I built, both fetched as full sourcetree downloads 
->in the last week.  It dies a random, silent death, and all 
->operations seem to stop cold once it occurs.
->
->My system is in production, so I had no choice but to revert. 
-> I'm going to try to get a test machine up if the problem 
->continues, and I'll see what I can do to tickle the system to 
->try to find out what's making it stall.
->
->The only thing I can say so far is that a random amount of 
->time after startup it just ceases operation, and any commands 
->entered at the CLI then have no effect.  "stop now" doesn't 
->work, either, and the only way to get going again is to exit 
->from the CLI (which *does* work) and then use "kill" to stop it.
->
->I've seen 2-3 reports on the users list that smells of the 
->same syndrome.
->
->B.
->_______________________________________________
->Asterisk-Dev mailing list
->Asterisk-Dev at lists.digium.com
->http://lists.digium.com/mailman/listinfo/asterisk-dev
->To UNSUBSCRIBE or update options visit:
->   http://lists.digium.com/mailman/listinfo/asterisk-dev
->





More information about the asterisk-dev mailing list