[asterisk-users] chan_sip deadlocks after some time

Henning Holtschneider henning at loca.net
Tue Jan 22 12:37:09 CST 2008


Hello everybody,

I'm running Asterisk 1.2.24 on three servers which are configured
almost identical. The servers use IAX to communicate between each other
and SIP to communicate with the outside world through a Patton
Smartnode 4960 gateway. One server has about 30 SIP phones registered,
the other two servers have about 100 phones registered each.

The "small" server runs fine without any problem whatsoever. On the two
larger servers, however, chan_sip stops processing calls and CLI
commands after some time. "Some time" is two hours one day or four
hours on another day. On some days, everything works flawlessly ...

Whenever chan_sip stops responding, I fire up gdb and I see something
like this:

(gdb) info thread
  25 Thread -1211487312 (LWP 12519)  0xb7ea2523 in poll () from /lib/tls/libc.so.6
  24 Thread -1211749456 (LWP 12520)  0xb7fb58ae in accept () from /lib/tls/libpthread.so.0
  23 Thread -1212011600 (LWP 12521)  0xb7fb3295 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0
  22 Thread -1215665232 (LWP 12522)  0xb7ea4a27 in select () from /lib/tls/libc.so.6
  21 Thread -1217684560 (LWP 12523)  0xb7e7c99c in nanosleep () from /lib/tls/libc.so.6
  20 Thread -1218057296 (LWP 12524)  0xb7ea2523 in poll () from /lib/tls/libc.so.6
  19 Thread -1218319440 (LWP 12525)  0xb7e7c99c in nanosleep () from /lib/tls/libc.so.6
  18 Thread -1218937936 (LWP 12526)  0xb7fb5436 in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
  17 Thread -1220633680 (LWP 12527)  0xb7ea2523 in poll () from /lib/tls/libc.so.6
  16 Thread -1221022800 (LWP 12528)  0xb7ea4a27 in select () from /lib/tls/libc.so.6
  15 Thread -1221940304 (LWP 12529)  0xb7ea2523 in poll () from /lib/tls/libc.so.6
  14 Thread -1227236432 (LWP 12540)  0xb7ea2523 in poll () from /lib/tls/libc.so.6
  13 Thread -1250468944 (LWP 23446)  0xb7ea2523 in poll () from /lib/tls/libc.so.6
  12 Thread -1250731088 (LWP 23712)  0xb7e7c99c in nanosleep () from /lib/tls/libc.so.6
  11 Thread -1246721104 (LWP 23717)  0xb7e7c99c in nanosleep () from /lib/tls/libc.so.6
  10 Thread -1245934672 (LWP 23734)  0xb7ea2523 in poll () from /lib/tls/libc.so.6
  9 Thread -1247245392 (LWP 23741)  0xb7ea2523 in poll () from /lib/tls/libc.so.6
  8 Thread -1246983248 (LWP 23772)  0xb7ea2523 in poll () from /lib/tls/libc.so.6
  7 Thread -1249682512 (LWP 23788)  0xb7ea2523 in poll () from /lib/tls/libc.so.6
  6 Thread -1246458960 (LWP 23817)  0xb7ea2523 in poll () from /lib/tls/libc.so.6
  5 Thread -1247507536 (LWP 23824)  0xb7ea2523 in poll () from /lib/tls/libc.so.6
  4 Thread -1245672528 (LWP 23827)  0xb7ea2523 in poll () from /lib/tls/libc.so.6
  3 Thread -1246196816 (LWP 23838)  0xb7ea2523 in poll () from /lib/tls/libc.so.6
  2 Thread -1250206800 (LWP 23848)  0xb7ea2523 in poll () from /lib/tls/libc.so.6
  1 Thread -1211293568 (LWP 12517)  0xb7ea2523 in poll () from /lib/tls/libc.so.6

I think the interesting thread is #18, so this is the "thread apply bt all" excerpt of thread #18:

Thread 18 (Thread -1218937936 (LWP 12526)):
#0  0xb7fb5436 in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#1  0xb7fb289f in _L_mutex_lock_73 () from /lib/tls/libpthread.so.0
#2  0x00000000 in ?? ()
#3  0xffffffff in ?? ()
#4  0xb7e4ce84 in strncasecmp () from /lib/tls/libc.so.6
#5  0x0806162d in ast_deactivate_generator (chan=0x0) at lock.h:601
#6  0xb7bdf259 in local_ast_moh_stop (chan=0x8177838) at res_musiconhold.c:939
#7  0x080676fc in ast_moh_stop (chan=0xfffffffc) at channel.c:3935
#8  0xb7595499 in process_sdp (p=0xb5963000, req=0xb75867b0) at chan_sip.c:3805
#9  0xb75af211 in handle_request_invite (p=0xb5963000, req=0xb75867b0, debug=0, ignore=0, seqno=4, sin=0xfffffffc, recount=0xfffffffc, 
    e=0xfffffffc <Address 0xfffffffc out of bounds>) at chan_sip.c:10671
#10 0xb75b0df1 in handle_request (p=0xb5963000, req=0xb75867b0, sin=0xb75867a0, recount=0xfffffffc, nounlock=0xfffffffc) at chan_sip.c:11457
#11 0xb75b1806 in sipsock_read (id=0x818c4d0, fd=13, events=1, ignore=0x0) at chan_sip.c:11603
#12 0x08055f87 in ast_io_wait (ioc=0x818c7f8, howlong=-4) at io.c:284
#13 0xb75b1f20 in do_monitor (data=0x0) at chan_sip.c:11774
#14 0xb7fb0b63 in start_thread () from /lib/tls/libpthread.so.0
#15 0xb7eab18a in clone () from /lib/tls/libc.so.6

The unknown symbols in lines 2 and 3 come from app_queue, which is
stripped on the machines.

I've tried every possible configuration change I can think of (ranging
from turning on/off canreinvite in sip.conf to removing all SIP Hints
in extensions.conf) without any visible success :-(

I would appreciate if someone with profound knowledge of chan_sip could
have a look at the issue. I can provide a full backtrace (except
information from app_queue.so, see above) if necessary. I would like to
file a bug in Digium's bugtracker, but I think it will be rejected
because I'm using Asterisk 1.2 (I cannot upgrade due to dialplan
incompatibilities). Since this is a commercial project, I'm also
willing to pay for support if in prospect of success.

Thanks for your help,
Henning Holtschneider
--
LocaNet oHG - http://www.loca.net
Lindemannstrasse 81, D-44137 Dortmund
tel +49 231 91596-25, fax +49 231 91596-55
sip 25 at voip.loca.net

Registergericht Amtsgericht Dortmund HRA 14208
Geschäftsführer Sven Haufe, Henning Holtschneider
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url : http://lists.digium.com/pipermail/asterisk-users/attachments/20080122/de313af6/attachment.pgp 


More information about the asterisk-users mailing list