[asterisk-bugs] [JIRA] (ASTERISK-26203) res_fax: Deadlock when using FAXOPT(gateway)=yes with Local channels
Richard Mudgett (JIRA)
noreply at issues.asterisk.org
Wed Aug 17 17:08:56 CDT 2016
[ https://issues.asterisk.org/jira/browse/ASTERISK-26203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Richard Mudgett reassigned ASTERISK-26203:
------------------------------------------
Assignee: Richard Mudgett
> res_fax: Deadlock when using FAXOPT(gateway)=yes with Local channels
> --------------------------------------------------------------------
>
> Key: ASTERISK-26203
> URL: https://issues.asterisk.org/jira/browse/ASTERISK-26203
> Project: Asterisk
> Issue Type: Bug
> Security Level: None
> Components: Resources/res_fax
> Affects Versions: 13.7.2, 13.9.1, 13.10.0-rc1
> Environment: Debian 8 amd64
> Reporter: Etienne Lessard
> Assignee: Richard Mudgett
> Attachments: bt-threads.txt, core-show-locks.txt, graph-core-show-locks.svg
>
>
> Asterisk sometimes deadlocks when you set "FAXOPT(gateway)=yes" on a channel and this channel dials a Local channel. Example:
> Given I have two lines, SIP/alice and SIP/bob
> Given I have the following dialplan:
> {noformat}
> [foo]
> exten = 31,1,NoOp()
> same = n,Set(FAXOPT(gateway)=yes,10)
> same = n,Dial(Local/32 at foo)
> same = n,Hangup()
> exten = 32,1,NoOp()
> same = n,Set(FAXOPT(gateway)=yes,10)
> same = n,Dial(SIP/bob)
> same = n,Hangup()
> {noformat}
> When Alice dials 31
> Then a deadlock might occurs
> This does not happens systematically, but depending on the system and the exact setup, I got between 1 occurrences every 500 tries to 1 occurrence at almost every try.
> I'm attaching a backtrace and the output of core show locks. There's actually 3 independent groups of deadlocked threads in these files:
> * thread 24074 is deadlocked with thread 24078
> * thread 28226 is deadlocked with thread 28230
> * thread 32143 is deadlocked with thread 32149
> From the analysis I've done, the problem is that there's a locking inversion issue occuring in situation like this one:
> {noformat}
> Thread 63 (Thread 0x7fd2cbcf8700 (LWP 24074)):
> #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
> #1 0x00007fd2ce2404d4 in _L_lock_952 () from /lib/x86_64-linux-gnu/libpthread.so.0
> #2 0x00007fd2ce240336 in __GI___pthread_mutex_lock (mutex=0x7fd2b450cc80) at ../nptl/pthread_mutex_lock.c:114
> #3 0x000000000053f3aa in __ast_pthread_mutex_lock (filename=0x6229cb "bridge.c", lineno=4005, func=0x624e10 <__PRETTY_FUNCTION__.18063> "ast_bridge_peer", mutex_name=0x622b1b "bridge", t=0x7fd2b450cc80) at lock.c:315
> #4 0x000000000045e0ce in __ao2_lock (user_data=0x7fd2b450ccd8, lock_how=AO2_LOCK_REQ_MUTEX, file=0x6229cb "bridge.c", func=0x624e10 <__PRETTY_FUNCTION__.18063> "ast_bridge_peer", line=4005, var=0x622b1b "bridge") at astobj2.c:167
> #5 0x000000000046937d in _ast_bridge_lock (bridge=0x7fd2b450ccd8, file=0x6229cb "bridge.c", function=0x624e10 <__PRETTY_FUNCTION__.18063> "ast_bridge_peer", line=4005, var=0x622b1b "bridge") at /root/ast-rebuild/asterisk-13.10.0-rc3/include/asterisk/bridge.h:377
> #6 0x00000000004739ed in ast_bridge_peer (bridge=0x7fd2b450ccd8, chan=0x7fd2b44df1b8) at bridge.c:4005
> #7 0x00000000004ccbbf in ast_channel_bridge_peer (chan=0x7fd2b44df1b8) at channel.c:10419
> #8 0x00000000004f23e2 in ast_unreal_queryoption (ast=0x7fd2b4543bb8, option=10, data=0x7fd2cbcf4090, datalen=0x7fd2cbcf4094) at core_unreal.c:194
> #9 0x00000000004c4395 in ast_channel_queryoption (chan=0x7fd2b4543bb8, option=10, data=0x7fd2cbcf4090, datalen=0x7fd2cbcf4094, block=0) at channel.c:7131
> #10 0x00007fd2c9150e2b in ast_channel_get_t38_state (chan=0x7fd2b4543bb8) at /root/ast-rebuild/asterisk-13.10.0-rc3/include/asterisk/channel.h:2578
> #11 0x00007fd2c9166f41 in fax_gateway_framehook (chan=0x7fd2b4543bb8, f=0x1444060, event=AST_FRAMEHOOK_EVENT_WRITE, data=0x7fd2b4187a48) at res_fax.c:3329
> #12 0x0000000000528554 in framehook_list_push_event (framehooks=0x7fd2b4719f30, frame=0x1444060, event=AST_FRAMEHOOK_EVENT_WRITE) at framehook.c:118
> #13 0x0000000000528e20 in ast_framehook_list_write_event (framehooks=0x7fd2b4719f30, frame=0x1444060) at framehook.c:317
> #14 0x00000000004bba96 in ast_indicate_data (chan=0x7fd2b4543bb8, _condition=26, data=0x0, datalen=0) at channel.c:4685
> #15 0x00000000004baedd in ast_indicate (chan=0x7fd2b4543bb8, condition=26) at channel.c:4288
> #16 0x0000000000487f3a in bridge_channel_internal_join (bridge_channel=0x7fd2b415ec48) at bridge_channel.c:2713
> #17 0x000000000046e018 in ast_bridge_join (bridge=0x7fd2b418bba8, chan=0x7fd2b4543bb8, swap=0x0, features=0x7fd2cbcf4610, tech_args=0x0, flags=(AST_BRIDGE_JOIN_PASS_REFERENCE | AST_BRIDGE_JOIN_INHIBIT_JOIN_COLP)) at bridge.c:1700
> #18 0x0000000000511223 in ast_bridge_call_with_flags (chan=0x7fd2b4543bb8, peer=0x7fd2b44c5c78, config=0x7fd2cbcf4b50, flags=0) at features.c:659
> #19 0x0000000000511309 in ast_bridge_call (chan=0x7fd2b4543bb8, peer=0x7fd2b44c5c78, config=0x7fd2cbcf4b50) at features.c:698
> #20 0x00007fd29ce03d14 in dial_exec_full (chan=0x7fd2b4543bb8, data=0x7fd2cbcf5af0 "SIP/machine02", peerflags=0x7fd2cbcf5580, continue_exec=0x0) at app_dial.c:3089
> #21 0x00007fd29ce040d7 in dial_exec (chan=0x7fd2b4543bb8, data=0x7fd2cbcf5af0 "SIP/machine02") at app_dial.c:3141
> #22 0x000000000058f144 in pbx_exec (c=0x7fd2b4543bb8, app=0x11d2680, data=0x7fd2cbcf5af0 "SIP/machine02") at pbx_app.c:485
> #23 0x000000000057bb8c in pbx_extension_helper (c=0x7fd2b4543bb8, con=0x0, context=0x7fd2b4544570 "xivo-extrafeatures", exten=0x7fd2b45445c0 "32", priority=3, label=0x0, callerid=0x7fd2b42a2280 "1101", action=E_SPAWN, found=0x7fd2cbcf7bcc, combined_find_spawn=1) at pbx.c:2834
> #24 0x000000000057f2fb in ast_spawn_extension (c=0x7fd2b4543bb8, context=0x7fd2b4544570 "xivo-extrafeatures", exten=0x7fd2b45445c0 "32", priority=3, callerid=0x7fd2b42a2280 "1101", found=0x7fd2cbcf7bcc, combined_find_spawn=1) at pbx.c:4060
> #25 0x000000000057fff7 in __ast_pbx_run (c=0x7fd2b4543bb8, args=0x0) at pbx.c:4235
> #26 0x0000000000581824 in pbx_thread (data=0x7fd2b4543bb8) at pbx.c:4555
> #27 0x000000000060c2dd in dummy_start (data=0x7fd2b45e58e0) at utils.c:1235
> #28 0x00007fd2ce23e0a4 in start_thread (arg=0x7fd2cbcf8700) at pthread_create.c:309
> #29 0x00007fd2cd2ed87d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> {noformat}
> i.e.:
> * bridge B1 is locked in bridge_channel_internal_join (#16)
> * channel C1 is locked in ast_indicate_data (#14)
> * channel C1 is unlocked and locked in fax_gateway_framehook (#11)
> * channel C1 is locked (+1) in ast_channel_queryoption (#9)
> * channel C2 is unlocked (-1) in ast_unreal_queryoption (#8) -- but it is in fact still locked at this stage
> * bridge B2 (a different one than the one locked in #16 -- it's actually the bridge that the other local channel is part of) is locked in ast_bridge_peer (#6)
> So you then have a situation where a bridge is locked while a channel lock is held, which from what I understand is not the correct locking order, even if that channel is not part of the bridge.
> I've done my latest tests on asterisk 13.10.0-rc3. I've also seen it occured on 13.9.1 and 13.7.2.
> This bug is somewhat related to ASTERISK-22732.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
More information about the asterisk-bugs
mailing list