<html>
<body>
<div style="font-family: Verdana, Arial, Helvetica, Sans-Serif;">
<table bgcolor="#f9f3c9" width="100%" cellpadding="8" style="border: 1px #c9c399 solid;">
<tr>
<td>
This is an automatically generated e-mail. To reply, visit:
<a href="https://reviewboard.asterisk.org/r/1018/">https://reviewboard.asterisk.org/r/1018/</a>
</td>
</tr>
</table>
<br />
<table bgcolor="#fefadf" width="100%" cellspacing="0" cellpadding="8" style="background-image: url('https://reviewboard.asterisk.orgrb/images/review_request_box_top_bg.png'); background-position: left top; background-repeat: repeat-x; border: 1px black solid;">
<tr>
<td>
<div>Review request for Asterisk Developers.</div>
<div>By Brett Bryant.</div>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Description </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: 1px solid #b8b5a0">
<tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">When setqueuevar is enabled in app_queue, the potential exists for one thread to lock a call_queue structure with a channel lock held while another thread locks the channel with the call_queue lock held. The resulting deadlock will end up crippling Asterisk since quite a few parts of Asterisk will traverse the channel list.
At several points in app_queue (in my case, end_bridge_callback), a queue will be locked before calling set_queue_variables (which pulls data directly out of the queue structure); this is fine, but when setqueuevar is actually enabled, the call to pbx_builtin_setvar_multiple will lock the channel. Meanwhile, another thread may be calling ast_do_masquerade() to clone the same channel, resulting in a lock on the same channel (clonechan.) Later in the masquerade, the datastore fixup functions are called, and queue_transfer_fixup's call to update_queue() will result in an attempt to lock the related queue (or all queues if shared_lastcall is enabled.)
This deadlock occurred for me on 1.6.2.6 but appears to present in 1.6.2.13 as well. In my case, do_masquerade() got the channel lock while end_bridge_callback() got the queue lock.
Queue config:
setqueuevar = yes
shared_lastcall = yes
All agents in the queue were Local channels. Unfortunately, I lost the core I forced from this deadlock, so I can't provide backtraces for the two deadlocked threads. I can tell you that they were (roughly)...
Thread A: ... -> ast_bridge_call -> app_queue.c end_bridge_callback [queue lock happens here] -> set_queue_variables -> pbx_builtin_setvar_multiple -> pbx_builtin_setvar_helper -> ast_channel_lock
Thread B: ... -> ast_do_masquerade [channel lock happens here] -> queue_transfer_fixup -> update_queue -> ao2_lock</pre>
</td>
</tr>
</table>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Testing </h1>
<table width="100%" bgcolor="#ffffff" cellspacing="0" cellpadding="10" style="border: 1px solid #b8b5a0">
<tr>
<td>
<pre style="margin: 0; padding: 0; white-space: pre-wrap; white-space: -moz-pre-wrap; white-space: -pre-wrap; white-space: -o-pre-wrap; word-wrap: break-word;">Queue testsuite tests ran before and after applying the patch.
--> queues/queue_baseline --- PASSED
--> queues/position_priority_maxlen --- PASSED
--> queues/macro_gosub_test --- PASSED
</pre>
</td>
</tr>
</table>
<div style="margin-top: 1.5em;">
<b style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Bugs: </b>
<a href="https://issues.asterisk.org/view.php?id=18031">18031</a>
</div>
<h1 style="color: #575012; font-size: 10pt; margin-top: 1.5em;">Diffs</b> </h1>
<ul style="margin-left: 3em; padding-left: 0;">
<li>/trunk/apps/app_queue.c <span style="color: grey">(294084)</span></li>
</ul>
<p><a href="https://reviewboard.asterisk.org/r/1018/diff/" style="margin-left: 3em;">View Diff</a></p>
</td>
</tr>
</table>
</div>
</body>
</html>