[asterisk-bugs] [JIRA] (ASTERISK-24838) chan_sip: Locking inversion occurs when building a peer causes a peer poke during request handling
Richard Mudgett (JIRA)
noreply at issues.asterisk.org
Fri Feb 27 17:33:34 CST 2015
[ https://issues.asterisk.org/jira/browse/ASTERISK-24838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on ASTERISK-24838 started by Richard Mudgett.
> chan_sip: Locking inversion occurs when building a peer causes a peer poke during request handling
> --------------------------------------------------------------------------------------------------
>
> Key: ASTERISK-24838
> URL: https://issues.asterisk.org/jira/browse/ASTERISK-24838
> Project: Asterisk
> Issue Type: Bug
> Security Level: None
> Components: Channels/chan_sip/General
> Affects Versions: 11.15.1
> Reporter: Richard Mudgett
> Assignee: Richard Mudgett
>
> A deadlock can happen when an incoming call for a realtime peer happens at the same time as a periodic CLI "sip show channels" is executing.
> From [~mjordan]'s analysis:
> {quote}
> Looking at the output provided, we can see that Thread 2 is currently pending on a {{sip_pvt}} lock during an iteration over the SIP dialogs container:
> {noformat}
> Thread 2 (Thread 0x2b9b2d17f700 (LWP 5337)):
> #0 0x00002b9adeb28d5c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
> #1 0x00002b9adeb243c4 in _L_lock_997 () from /lib/x86_64-linux-gnu/libpthread.so.0
> #2 0x00002b9adeb24227 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
> #3 0x00002b9ae385988c in show_channels_cb (__cur=0x2b9b487e6688, __arg=0x2b9b2d17e3c0, flags=<optimized out>) at chan_sip.c:20862
> #4 0x000000000045008c in internal_ao2_callback (c=0x2b9ae4011c48, flags=flags at entry=OBJ_NODATA, cb_fn=cb_fn at entry=0x2b9ae3859850, arg=arg at entry=0x2b9b2d17e3c0, tag=tag at entry=0x0, file=file at entry=0x0, line=line at entry=0, func=func at entry=0x0, type=DEFAULT, data=0x0) at astobj2.c:1107
> #5 0x00000000004521ff in __ao2_callback (c=<optimized out>, flags=flags at entry=OBJ_NODATA, cb_fn=cb_fn at entry=0x2b9ae3859850 <show_channels_cb>, arg=arg at entry=0x2b9b2d17e3c0) at astobj2.c:1212
> #6 0x00002b9ae3844109 in sip_show_channels (e=<optimized out>, cmd=<optimized out>, a=<optimized out>) at chan_sip.c:20933
> #7 0x00000000004a1d3a in ast_cli_command_full (uid=uid at entry=0, gid=gid at entry=0, fd=fd at entry=137, s=s at entry=0x2b9b2d17e740 "sip show channels") at cli.c:2691
> #8 0x00000000004a1ee3 in ast_cli_command_multiple_full (uid=0, gid=0, fd=137, size=size at entry=18, s=s at entry=0x2b9b2d17ea1f "sip show channels") at cli.c:2714
> #9 0x000000000044b7b5 in netconsole (vconsole=vconsole at entry=0x8662f0) at asterisk.c:1378
> #10 0x0000000000591b8a in dummy_start (data=<optimized out>) at utils.c:1223
> #11 0x00002b9adeb21b50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
> #12 0x00002b9adda027bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #13 0x0000000000000000 in ?? ()
> {noformat}
> In Thread 77, we can see that {{chan_sip}} is blocked attempting to link a new SIP pvt into the same {{dialogs}} container:
> {noformat}
> Thread 77 (Thread 0x2b9b00886700 (LWP 26092)):
> #0 0x00002b9adeb28d5c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
> #1 0x00002b9adeb243c4 in _L_lock_997 () from /lib/x86_64-linux-gnu/libpthread.so.0
> #2 0x00002b9adeb24227 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
> #3 0x000000000044f56b in internal_ao2_link (c=0x2b9ae4011c48, user_data=user_data at entry=0x2b9b48345d88, flags=flags at entry=0, tag=tag at entry=0x0, file=file at entry=0x5b6286 "astobj2.c", line=line at entry=937, func=func at entry=0x5b6442 "__ao2_link") at astobj2.c:905
> #4 0x000000000045201f in __ao2_link (c=<optimized out>, obj_new=obj_new at entry=0x2b9b48345d88, flags=flags at entry=0) at astobj2.c:937
> #5 0x00002b9ae3891b43 in sip_alloc (callid=callid at entry=0x0, addr=addr at entry=0x0, useglobal_nat=useglobal_nat at entry=0, intended_method=intended_method at entry=3, req=req at entry=0x0, logger_callid=logger_callid at entry=0x0) at chan_sip.c:8634
> #6 0x00002b9ae389347c in sip_poke_peer (peer=peer at entry=0x2319558, force=0, force at entry=xxxxxxxxxxx) at chan_sip.c:29585
> #7 0x00002b9ae3897575 in build_peer (name=name at entry=0x2b9b486ce1e5 "xxxxxxxxxxx", v=<optimized out>, v at entry=0x2b9b4846f280, alt=<optimized out>, alt at entry=0x0, devstate_only=devstate_only at entry=0, realtime=1) at chan_sip.c:31253
> #8 0x00002b9ae389c05e in realtime_peer (newpeername=0x2b9b486ce1e5 "xxxxxxxxxx", newpeername at entry=0x0, addr=addr at entry=0x2b9b487eb030, callbackexten=callbackexten at entry=0x2b9b008837b4 "xxxxxxxxxx", devstate_only=devstate_only at entry=0, which_objects=which_objects at entry=2) at chan_sip.c:5435
> #9 0x00002b9ae389cc95 in sip_find_peer_full (addr=addr at entry=0x2b9b487eb030, callbackexten=0x2b9b008837b4 "xxxxxxxxxx", transport=<optimized out>, devstate_only=0, which_objects=2, realtime=1, peer=0x0) at chan_sip.c:5530
> #10 0x00002b9ae38c558e in sip_find_peer_by_ip_and_exten (transport=<optimized out>, callbackexten=<optimized out>, addr=0x2b9b487eb030) at chan_sip.c:5574
> #11 check_peer_ok (p=p at entry=0x2b9b487e6688, of=0x2b9b4851dda5 "xxxxxxxxxx", req=req at entry=0x2b9b00885300, sipmethod=sipmethod at entry=5, addr=addr at entry=0x2b9b00885270, authpeer=authpeer at entry=0x2b9b00883b80, reliable=reliable at entry=XMIT_RELIABLE, uri2=uri2 at entry=0x2b9b008838f0 "sip:xxxxxxxxxx at sip.xxxxxxxx.com:5060", calleridname=<error reading variable: Unhandled dwarf expression opcode 0xfa>) at chan_sip.c:18087
> #12 0x00002b9ae38c6025 in check_user_full (p=p at entry=0x2b9b487e6688, req=req at entry=0x2b9b00885300, sipmethod=sipmethod at entry=5, uri=uri at entry=0x2b9b4882b18f "sip:xxxxxxxxxx at sip.xxxxxxxx.com:5060", reliable=reliable at entry=XMIT_RELIABLE, addr=addr at entry=0x2b9b00885270, authpeer=authpeer at entry=0x2b9b00883b80) at chan_sip.c:18401
> #13 0x00002b9ae38c7ec9 in handle_request_invite (p=p at entry=0x2b9b487e6688, req=req at entry=0x2b9b00885300, addr=addr at entry=0x2b9b00885270, seqno=1, recount=recount at entry=0x2b9b00885200, e=e at entry=0x2b9b4882b18f "sip:xxxxxxxxxx at sip.xxxxxxxxxx.com:5060", nounlock=nounlock at entry=0x2b9b00885210) at chan_sip.c:25563
> #14 0x00002b9ae38cf2dd in handle_incoming (p=p at entry=0x2b9b487e6688, req=req at entry=0x2b9b00885300, addr=addr at entry=0x2b9b00885270, recount=recount at entry=0x2b9b00885200, nounlock=nounlock at entry=0x2b9b00885210) at chan_sip.c:28339
> #15 0x00002b9ae38d1ae2 in handle_request_do (req=req at entry=0x2b9b00885300, addr=addr at entry=0x2b9b00885270) at chan_sip.c:28548
> #16 0x00002b9ae38d34b6 in sipsock_read (id=<optimized out>, fd=<optimized out>, events=<optimized out>, ignore=<optimized out>) at chan_sip.c:28479
> #17 0x00000000004ffe66 in ast_io_wait (ioc=0x2b9ae4001b40, howlong=<optimized out>) at io.c:292
> #18 0x00002b9ae38ade1f in do_monitor (data=data at entry=0x0) at chan_sip.c:29077
> #19 0x0000000000591b8a in dummy_start (data=<optimized out>) at utils.c:1223
> #20 0x00002b9adeb21b50 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
> #21 0x00002b9adda027bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #22 0x0000000000000000 in ?? ()
> {noformat}
> So, Thread 77 is waiting for Thread 2 to complete. However, what is interesting here is that Thread 77 is currently operating on that same {{sip_pvt}}, and currently has it locked. In essence, Thread 77 is creating its own locking inversion:
> # Hold {{sip_pvt}} {{0x2b9b487e6688}} lock as the result of {{handle_request_do}}
> # As the result of {{sip_poke_peer}}, we go off to create a new {{sip_pvt}}. This requires grabbing the {{dialogs}} container while we hold a lock to a {{sip_pvt}}, creating the possibility of a locking inversion
> # At the same time, the CLI command {{sip show channels}} is run on Thread 2. This grabs the lock on the {{dialogs}} container and iterates through the {{sip_pvt}} objects. This blocks Thread 77 from grabbing the {{dialogs}} container; however, Thread 2 cannot grab the lock for {{0x2b9b487e6688}}, causing the deadlock.
> The solution here is to release the {{sip_pvt}} lock on {{0x2b9b487e6688}} before trying to find the peer. Optionally, we could also not try to poke the peer during the process of the build peer, and instead push that off to a scheduler. That would keep the process of looking up the peer from trying to create a new dialog while a {{sip_pvt}} is currently locked.
> {quote}
> I've opted to push the peer poke off onto the scheduler rather than unlock the {{sip_pvt}}.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
More information about the asterisk-bugs
mailing list