[asterisk-bugs] [JIRA] (ASTERISK-23013) [patch] Deadlock between 'sip show channels' command and attended transfer handling
Matt Jordan (JIRA)
noreply at issues.asterisk.org
Tue Dec 17 09:43:04 CST 2013
[ https://issues.asterisk.org/jira/browse/ASTERISK-23013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=213047#comment-213047 ]
Matt Jordan commented on ASTERISK-23013:
----------------------------------------
The analysis should be fine Ben - thanks. The fact that the order is swapped between threads is a clear indication that someone isn't following the correct locking order.
> [patch] Deadlock between 'sip show channels' command and attended transfer handling
> -----------------------------------------------------------------------------------
>
> Key: ASTERISK-23013
> URL: https://issues.asterisk.org/jira/browse/ASTERISK-23013
> Project: Asterisk
> Issue Type: Bug
> Security Level: None
> Components: Channels/chan_sip/General
> Affects Versions: 11.6.0
> Reporter: Ben Smithurst
> Attachments: deadlock-fix.diff, sip_show_channels.pl
>
>
> We have been seeing an occasional freeze of Asterisk which we have tracked down to an apparent deadlock between code handling 'sip show channels' and code handling attended transfers.
> We've been seeing this maybe once every couple of days on servers handling ~200-300 concurrent calls at peak time. 'sip show channels' is being executed every 5 minutes by a munin plugin.
> I'll dig out the full trace in a second - if I still have a copy - but some important details are below.
> {code}
> (gdb) t 6
> [Switching to thread 6 (Thread 0x7fc1fa7ee700 (LWP 4006))]
> #0 0x00007fc205f9bcec in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
> (gdb) bt
> #0 0x00007fc205f9bcec in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
> #1 0x00007fc205f97354 in _L_lock_997 () from /lib/x86_64-linux-gnu/libpthread.so.0
> #2 0x00007fc205f971b7 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
> #3 0x00000000005012a1 in __ast_pthread_mutex_lock (filename=0x5b0c14 "astobj2.c", lineno=1087, func=0x5b0f50 "internal_ao2_callback", mutex_name=0x5b0dfe "c", t=0x7fc2005aac20) at lock.c:273
> #4 0x000000000044f5a8 in __ao2_lock (user_data=0x7fc2005aac78, lock_how=AO2_LOCK_REQ_RDLOCK, file=0x5b0c14 "astobj2.c", func=0x5b0f50 "internal_ao2_callback", line=1087, var=0x5b0dfe "c") at astobj2.c:192
> #5 0x0000000000450d3e in internal_ao2_callback (c=0x7fc2005aac78, flags=OBJ_POINTER, cb_fn=0x7fc1fd1ca63d, arg=0x7fc1fa7e55e0, data=0x0, type=DEFAULT, tag=0x0, file=0x0, line=0, func=0x0) at astobj2.c:1087
> #6 0x0000000000451281 in __ao2_callback (c=0x7fc2005aac78, flags=OBJ_POINTER, cb_fn=0x7fc1fd1ca63d <dialog_cmp_cb>, arg=0x7fc1fa7e55e0) at astobj2.c:1206
> #7 0x00000000004513da in __ao2_find (c=0x7fc2005aac78, arg=0x7fc1fa7e55e0, flags=OBJ_POINTER) at astobj2.c:1237
> #8 0x00007fc1fd178b05 in get_sip_pvt_byid_locked (callid=0x7fc1d019972a "5718d144-f8d09c30 at 192.168.0.4", totag=0x7fc1d019974a "as012f4e00", fromtag=0x7fc1d0199758 "329b6e344ad2400o0") at chan_sip.c:17616
> #9 0x00007fc1fd1a256d in local_attended_transfer (transferer=0x7fc1cc593fb8, current=0x7fc1fa7ec490, req=0x7fc1fa7ed330, seqno=104, nounlock=0x7fc1fa7ed248) at chan_sip.c:25924
> #10 0x00007fc1fd1a446c in handle_request_refer (p=0x7fc1cc593fb8, req=0x7fc1fa7ed330, seqno=104, nounlock=0x7fc1fa7ed248) at chan_sip.c:26393
> #11 0x00007fc1fd1ad03a in handle_incoming (p=0x7fc1cc593fb8, req=0x7fc1fa7ed330, addr=0x7fc1fa7ed2a0, recount=0x7fc1fa7ed24c, nounlock=0x7fc1fa7ed248) at chan_sip.c:28298
> #12 0x00007fc1fd1ada7b in handle_request_do (req=0x7fc1fa7ed330, addr=0x7fc1fa7ed2a0) at chan_sip.c:28484
> #13 0x00007fc1fd1ad678 in sipsock_read (id=0x1e429a0, fd=14, events=1, ignore=0x0) at chan_sip.c:28415
> #14 0x00000000004fba8e in ast_io_wait (ioc=0x1e304f0, howlong=145) at io.c:292
> #15 0x00007fc1fd1af4d4 in do_monitor (data=0x0) at chan_sip.c:29013
> #16 0x000000000058de4c in dummy_start (data=0x7fc2005a4f90) at utils.c:1162
> #17 0x00007fc205f94b50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
> #18 0x00007fc206d76a7d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #19 0x0000000000000000 in ?? ()
> {code}
> Thread 6 above is handling an attended transfer, the key pointer is transferer=0x7fc1cc593fb8 on frame 9, this is locked in handle_request_refer (frame 10).
> {code}
> /* both p and p->owner _MUST_ be locked while calling local_attended_transfer */
> if ((res = local_attended_transfer(p, ¤t, req, seqno, nounlock))) {
> goto handle_refer_cleanup; /* We're done with the transfer */
> }
> {code}
> Frame 9 then calls get_sip_pvt_byid_locked, which does an ao2_t_find call over the dialogs list. Subsequently frame 5 tries to lock the dialogs list (in AO2 terms, it is the ‘c’ parameter, the container).
> Next...
> {code}
> (gdb) t 22
> [Switching to thread 22 (Thread 0x7fc1e1842700 (LWP 45850))]
> #0 0x00007fc205f9bcec in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
> (gdb) bt
> #0 0x00007fc205f9bcec in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
> #1 0x00007fc205f97354 in _L_lock_997 () from /lib/x86_64-linux-gnu/libpthread.so.0
> #2 0x00007fc205f971b7 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
> #3 0x00000000005012a1 in __ast_pthread_mutex_lock (filename=0x7fc1fd1da774 "chan_sip.c", lineno=20871, func=0x7fc1fd1f4280 "show_channels_cb", mutex_name=0x7fc1fd1e723f "cur", t=0x7fc1cc593f60) at lock.c:273
> #4 0x000000000044f5a8 in __ao2_lock (user_data=0x7fc1cc593fb8, lock_how=AO2_LOCK_REQ_MUTEX, file=0x7fc1fd1da774 "chan_sip.c", func=0x7fc1fd1f4280 "show_channels_cb", line=20871, var=0x7fc1fd1e723f "cur") at astobj2.c:192
> #5 0x00007fc1fd18b778 in show_channels_cb (__cur=0x7fc1cc593fb8, __arg=0x7fc1e1841430, flags=2) at chan_sip.c:20871
> #6 0x0000000000450e23 in internal_ao2_callback (c=0x7fc2005aac78, flags=OBJ_NODATA, cb_fn=0x7fc1fd18b70e, arg=0x7fc1e1841430, data=0x0, type=DEFAULT, tag=0x0, file=0x0, line=0, func=0x0) at astobj2.c:1101
> #7 0x0000000000451281 in __ao2_callback (c=0x7fc2005aac78, flags=OBJ_NODATA, cb_fn=0x7fc1fd18b70e <show_channels_cb>, arg=0x7fc1e1841430) at astobj2.c:1206
> #8 0x00007fc1fd18be28 in sip_show_channels (e=0x7fc1fd403d60, cmd=-4, a=0x7fc1e1841470) at chan_sip.c:20942
> #9 0x00000000004a1912 in ast_cli_command_full (uid=0, gid=109, fd=215, s=0x7fc1e1841750 "sip show channels") at cli.c:2588
> #10 0x00000000004a1a60 in ast_cli_command_multiple_full (uid=0, gid=109, fd=215, size=18, s=0x7fc1e1841ba0 "sip show channels") at cli.c:2611
> #11 0x0000000000446951 in netconsole (vconsole=0x8622c0) at asterisk.c:1341
> #12 0x000000000058de4c in dummy_start (data=0x7fc1e403d2b0) at utils.c:1162
> #13 0x00007fc205f94b50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
> #14 0x00007fc206d76a7d in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #15 0x0000000000000000 in ?? ()
> {code}
> This is handling the ‘sip show channels’ command (see args in frames 9/10). This iterates over ‘dialogs’ in a similar way, locking the container, and for each entry, show_channels_cb is called (frame 5, note that the __cur pointer is the same as the important transferer pointer from thread 6), subsequently this pointer is locked at frame 4.
> So in summary my assessment is as follows
> Thread 6: attended transfer, locks a sip_pvt, then the dialogs list
> Thread 22: sip show channels, locks the dialogs list, then a sip_pvt
> I've a patch which solves the problem for us, but it's really a workaround at best. For our purposes a bit of missing information in 'sip show channels' output is far preferable to a deadlock though.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.asterisk.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
More information about the asterisk-bugs
mailing list