[asterisk-dev] How does a call to ast_read end up in ast_write

Stephen Davies stephen.l.davies at gmail.com
Mon Dec 24 00:35:47 CST 2007


Hi,
I'm trying to track some sort of deadlock in 1.2 involving Local channels
and calls being placed on/off hold.
I think the problem bites when both ends of a call have placed the other on
hold at the same time.

(I've been working in 1.2.24, but the code isn't changed in the most recent.
 I'll investigate 1.4/trunk once I've fixed the clients' immediate problem)

I have three threads involved in the deadlock:


The first is the original caller bridging to the Local/...,1 channel:

Thread 24 (process 10764):
#0  0xffffe410 in __kernel_vsyscall ()
#1  0x0089eb86 in __nanosleep_nocancel () from /lib/tls/libc.so.6
#2  0x008d530a in usleep () from /lib/tls/libc.so.6
#3  0xb7b940bc in local_queue_frame (p=0xb65006b8, isoutbound=Variable
"isoutbound" is not available.
) at chan_local.c:152
#4  0xb7b944e6 in local_write (ast=0xb6508300, f=0xb710a908) at
chan_local.c:261
#5  0x080655ab in ast_write (chan=0xb6508300, fr=0xb7105434) at channel.c
:2358
#6  0xb7e2eda2 in moh_files_generator (chan=0xb6508300, data=0xb7101ac0,
len=160, samples=160) at res_musiconhold.c:271
#7  0x0806639f in ast_read (chan=0xb6508300) at channel.c:2050
#8  0x08069778 in ast_channel_bridge (c0=0xb6513e90, c1=0xb6508300,
config=0xb70f8850, fo=0xb70f7fd0, rc=0xb70f7fd4) at channel.c:3399
#9  0xb7e02d45 in ast_bridge_call (chan=0xb6513e90, peer=0xb6508300,
config=0xb70f8850) at res_features.c:1341
#10 0xb73f8c2b in try_calling (qe=0xb70f8b50, options=Variable "options" is
not available.
) at app_queue.c:2428
#11 0xb73fd28e in queue_exec (chan=0xb6513e90, data=0xb70fd0a0) at
app_queue.c:3163
#12 0x0809058d in pbx_extension_helper (c=0xb6513e90, con=Variable "con" is
not available.
) at pbx.c:574
#13 0x08091866 in __ast_pbx_run (c=0xb6513e90) at pbx.c:2250
#14 0x0809340c in pbx_thread (data=0xfffffdfc) at pbx.c:2537
#15 0x009db371 in start_thread () from /lib/tls/libpthread.so.0
#16 0x008dbffe in clone () from /lib/tls/libc.so.6


The second is the bridge from the Local/.,2 channel to the end point:

Thread 23 (process 11544):
#0  0xffffe410 in __kernel_vsyscall ()
#1  0x0089eb86 in __nanosleep_nocancel () from /lib/tls/libc.so.6
#2  0x008d530a in usleep () from /lib/tls/libc.so.6
#3  0xb7b941fc in local_queue_frame (p=0xb65006b8, isoutbound=Variable
"isoutbound" is not available.
) at chan_local.c:152
#4  0xb7b944e6 in local_write (ast=0xb65078f8, f=0x823e018) at
chan_local.c:261
#5  0x080655ab in ast_write (chan=0xb65078f8, fr=0x82b9cdc) at channel.c
:2358
#6  0xb7e2eda2 in moh_files_generator (chan=0xb65078f8, data=0xb65087d8,
len=160, samples=160) at res_musiconhold.c:271
#7  0x0806639f in ast_read (chan=0xb65078f8) at channel.c:2050
#8  0x08069778 in ast_channel_bridge (c0=0xb65078f8, c1=0x8256d60,
config=0xb6fc7d50, fo=0xb6fc7200, rc=0xb6fc7204) at channel.c:3399
#9  0xb7e02d45 in ast_bridge_call (chan=0xb65078f8, peer=0x8256d60,
config=0xb6fc7d50) at res_features.c:1341
#10 0xb74a30ba in dial_exec_full (chan=0xb65078f8, data=Variable "data" is
not available.
) at app_dial.c:1633
#11 0xb74a5a33 in dial_exec (chan=0xfffffdfc, data=0xb7b96cfc) at
app_dial.c:1680
#12 0x0809058d in pbx_extension_helper (c=0xb65078f8, con=Variable "con" is
not available.
) at pbx.c:574
#13 0x08090b4c in ast_spawn_extension (c=0xfffffdfc, context=0xb7b96cfc "",
exten=0xb7b96cfc "", priority=-1212584708,
   callerid=0xb7b96cfc "") at pbx.c:2250
#14 0xb7278ed0 in macro_exec (chan=0xb65078f8, data=0xb6fd3b30) at
app_macro.c:297
#15 0x0809058d in pbx_extension_helper (c=0xb65078f8, con=Variable "con" is
not available.
) at pbx.c:574
#16 0x08090b4c in ast_spawn_extension (c=0xfffffdfc, context=0xb7b96cfc "",
exten=0xb7b96cfc "", priority=-1212584708,
   callerid=0xb7b96cfc "") at pbx.c:2250
#17 0xb7278ed0 in macro_exec (chan=0xb65078f8, data=0xb6fdb0a0) at
app_macro.c:297
#18 0x0809058d in pbx_extension_helper (c=0xb65078f8, con=Variable "con" is
not available.
) at pbx.c:574
#19 0x08091866 in __ast_pbx_run (c=0xb65078f8) at pbx.c:2250
#20 0x0809340c in pbx_thread (data=0xfffffdfc) at pbx.c:2537
#21 0x009db371 in start_thread () from /lib/tls/libpthread.so.0
#22 0x008dbffe in clone () from /lib/tls/libc.so.6

These two threads are in a sort of soft-deadlock: they are running some
deadlock-avoiding logic in local_queue_frame, but its not working
effectively due to the fact that the channel that each is holding was locked
twice before local_queue_frame was called: once in ast_read and once in
ast_write.  So the deadlock-avoiding logic unlocks it once, but the
recursive lock still has one lock active.

Lastly, the SIP signalling thread is trying to get the one call off hold,
but is now stuck because it is waiting also to get a lock on one of the
Local channels.  Given that the SIP signalling thread is stuck, all SIP is
dead:

Thread 27 (process 8458):
#0  0xffffe410 in __kernel_vsyscall ()
#1  0x009e01de in __lll_mutex_lock_wait () from /lib/tls/libpthread.so.0
#2  0x009dce47 in _L_mutex_lock_179 () from /lib/tls/libpthread.so.0
#3  0xb781d24a in ?? ()
#4  0xb6508300 in ?? ()
#5  0xb65083cc in ?? ()
#6  0xb7815f90 in ?? ()
#7  0xb78164f8 in ?? ()
#8  0x08062cbd in ast_deactivate_generator (chan=0xb6508300) at
include/asterisk/lock.h:601
#9  0xb7e31029 in local_ast_moh_stop (chan=0xb6508300) at
res_musiconhold.c:939
#10 0xb7829372 in process_sdp (p=0x0, req=0x0) at chan_sip.c:3801


To find the root cause, I'm trying to understand why the call to ast_read
ended up in ast_write.  I've not had to delve into channel.c gory details
before, but ast_read is to return a frame - not sure why the moh_generator
doesn't just return a frame for ast_read to return, and why instead ast_read
calls moh_generator which then calls ast_write!?

Thanks for your pointers!

Steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.digium.com/pipermail/asterisk-dev/attachments/20071224/df8afe0b/attachment.htm 


More information about the asterisk-dev mailing list