[asterisk-bugs] [JIRA] (ASTERISK-26203) Deadlock when using FAXOPT(gateway)=yes with Local channels

Etienne Lessard (JIRA) noreply at issues.asterisk.org
Fri Jul 15 15:56:56 CDT 2016


Etienne Lessard created ASTERISK-26203:
------------------------------------------

             Summary: Deadlock when using FAXOPT(gateway)=yes with Local channels
                 Key: ASTERISK-26203
                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-26203
             Project: Asterisk
          Issue Type: Bug
      Security Level: None
          Components: Resources/res_fax
    Affects Versions: 13.9.1, 13.7.2, 13.10.0-rc1
         Environment: Debian 8 amd64
            Reporter: Etienne Lessard
         Attachments: bt-threads.txt, core-show-locks.txt, graph-core-show-locks.svg

Asterisk sometimes deadlocks when you set "FAXOPT(gateway)=yes" on a channel and this channel dials a Local channel. Example:

Given I have two lines, SIP/alice and SIP/bob
Given I have the following dialplan:
{noformat}
[foo]
exten = 31,1,NoOp()
same  =   n,Set(FAXOPT(gateway)=yes,10)
same  =   n,Dial(Local/32 at foo)
same  =   n,Hangup()

exten = 32,1,NoOp()
same  =   n,Set(FAXOPT(gateway)=yes,10)
same  =   n,Dial(SIP/bob)
same  =   n,Hangup()
{noformat}
When Alice dials 31
Then a deadlock might occurs

This does not happens systematically, but depending on the system and the exact setup, I got between 1 occurrences every 500 tries to 1 occurrence at almost every try.

I'm attaching a backtrace and the output of core show locks. There's actually 3 independent groups of deadlocked threads in these files:
* thread 24074 is deadlocked with thread 24078
* thread 28226 is deadlocked with thread 28230
* thread 32143 is deadlocked with thread 32149

>From the analysis I've done, the problem is that there's a locking inversion issue occuring in situation like this one:
{noformat}
Thread 63 (Thread 0x7fd2cbcf8700 (LWP 24074)):
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007fd2ce2404d4 in _L_lock_952 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007fd2ce240336 in __GI___pthread_mutex_lock (mutex=0x7fd2b450cc80) at ../nptl/pthread_mutex_lock.c:114
#3  0x000000000053f3aa in __ast_pthread_mutex_lock (filename=0x6229cb "bridge.c", lineno=4005, func=0x624e10 <__PRETTY_FUNCTION__.18063> "ast_bridge_peer", mutex_name=0x622b1b "bridge", t=0x7fd2b450cc80) at lock.c:315
#4  0x000000000045e0ce in __ao2_lock (user_data=0x7fd2b450ccd8, lock_how=AO2_LOCK_REQ_MUTEX, file=0x6229cb "bridge.c", func=0x624e10 <__PRETTY_FUNCTION__.18063> "ast_bridge_peer", line=4005, var=0x622b1b "bridge") at astobj2.c:167
#5  0x000000000046937d in _ast_bridge_lock (bridge=0x7fd2b450ccd8, file=0x6229cb "bridge.c", function=0x624e10 <__PRETTY_FUNCTION__.18063> "ast_bridge_peer", line=4005, var=0x622b1b "bridge") at /root/ast-rebuild/asterisk-13.10.0-rc3/include/asterisk/bridge.h:377
#6  0x00000000004739ed in ast_bridge_peer (bridge=0x7fd2b450ccd8, chan=0x7fd2b44df1b8) at bridge.c:4005
#7  0x00000000004ccbbf in ast_channel_bridge_peer (chan=0x7fd2b44df1b8) at channel.c:10419
#8  0x00000000004f23e2 in ast_unreal_queryoption (ast=0x7fd2b4543bb8, option=10, data=0x7fd2cbcf4090, datalen=0x7fd2cbcf4094) at core_unreal.c:194
#9  0x00000000004c4395 in ast_channel_queryoption (chan=0x7fd2b4543bb8, option=10, data=0x7fd2cbcf4090, datalen=0x7fd2cbcf4094, block=0) at channel.c:7131
#10 0x00007fd2c9150e2b in ast_channel_get_t38_state (chan=0x7fd2b4543bb8) at /root/ast-rebuild/asterisk-13.10.0-rc3/include/asterisk/channel.h:2578
#11 0x00007fd2c9166f41 in fax_gateway_framehook (chan=0x7fd2b4543bb8, f=0x1444060, event=AST_FRAMEHOOK_EVENT_WRITE, data=0x7fd2b4187a48) at res_fax.c:3329
#12 0x0000000000528554 in framehook_list_push_event (framehooks=0x7fd2b4719f30, frame=0x1444060, event=AST_FRAMEHOOK_EVENT_WRITE) at framehook.c:118
#13 0x0000000000528e20 in ast_framehook_list_write_event (framehooks=0x7fd2b4719f30, frame=0x1444060) at framehook.c:317
#14 0x00000000004bba96 in ast_indicate_data (chan=0x7fd2b4543bb8, _condition=26, data=0x0, datalen=0) at channel.c:4685
#15 0x00000000004baedd in ast_indicate (chan=0x7fd2b4543bb8, condition=26) at channel.c:4288
#16 0x0000000000487f3a in bridge_channel_internal_join (bridge_channel=0x7fd2b415ec48) at bridge_channel.c:2713
#17 0x000000000046e018 in ast_bridge_join (bridge=0x7fd2b418bba8, chan=0x7fd2b4543bb8, swap=0x0, features=0x7fd2cbcf4610, tech_args=0x0, flags=(AST_BRIDGE_JOIN_PASS_REFERENCE | AST_BRIDGE_JOIN_INHIBIT_JOIN_COLP)) at bridge.c:1700
#18 0x0000000000511223 in ast_bridge_call_with_flags (chan=0x7fd2b4543bb8, peer=0x7fd2b44c5c78, config=0x7fd2cbcf4b50, flags=0) at features.c:659
#19 0x0000000000511309 in ast_bridge_call (chan=0x7fd2b4543bb8, peer=0x7fd2b44c5c78, config=0x7fd2cbcf4b50) at features.c:698
#20 0x00007fd29ce03d14 in dial_exec_full (chan=0x7fd2b4543bb8, data=0x7fd2cbcf5af0 "SIP/machine02", peerflags=0x7fd2cbcf5580, continue_exec=0x0) at app_dial.c:3089
#21 0x00007fd29ce040d7 in dial_exec (chan=0x7fd2b4543bb8, data=0x7fd2cbcf5af0 "SIP/machine02") at app_dial.c:3141
#22 0x000000000058f144 in pbx_exec (c=0x7fd2b4543bb8, app=0x11d2680, data=0x7fd2cbcf5af0 "SIP/machine02") at pbx_app.c:485
#23 0x000000000057bb8c in pbx_extension_helper (c=0x7fd2b4543bb8, con=0x0, context=0x7fd2b4544570 "xivo-extrafeatures", exten=0x7fd2b45445c0 "32", priority=3, label=0x0, callerid=0x7fd2b42a2280 "1101", action=E_SPAWN, found=0x7fd2cbcf7bcc, combined_find_spawn=1) at pbx.c:2834
#24 0x000000000057f2fb in ast_spawn_extension (c=0x7fd2b4543bb8, context=0x7fd2b4544570 "xivo-extrafeatures", exten=0x7fd2b45445c0 "32", priority=3, callerid=0x7fd2b42a2280 "1101", found=0x7fd2cbcf7bcc, combined_find_spawn=1) at pbx.c:4060
#25 0x000000000057fff7 in __ast_pbx_run (c=0x7fd2b4543bb8, args=0x0) at pbx.c:4235
#26 0x0000000000581824 in pbx_thread (data=0x7fd2b4543bb8) at pbx.c:4555
#27 0x000000000060c2dd in dummy_start (data=0x7fd2b45e58e0) at utils.c:1235
#28 0x00007fd2ce23e0a4 in start_thread (arg=0x7fd2cbcf8700) at pthread_create.c:309
#29 0x00007fd2cd2ed87d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
{noformat}

i.e.:
* bridge B1 is locked in bridge_channel_internal_join (#16)
* channel C1 is locked in ast_indicate_data (#14)
* channel C1 is unlocked and locked in fax_gateway_framehook (#11)
* channel C1 is locked (+1) in ast_channel_queryoption (#9)
* channel C2 is unlocked (-1) in ast_unreal_queryoption (#8) -- but it is in fact still locked at this stage
* bridge B2 (a different one than the one locked in #16 -- it's actually the bridge that the other local channel is part of) is locked in ast_bridge_peer (#6)

So you then have a situation where a bridge is locked while a channel lock is held, which from what I understand is not the correct locking order, even if that channel is not part of the bridge.

I've done my latest tests on asterisk 13.10.0-rc3. I've also seen it occured on 13.9.1 and 13.7.2.

This bug is somewhat related to ASTERISK-22732.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list