[asterisk-dev] [Code Review] [18031] Patch for deadlock locking order issue between channel/queue during set_queue_variables
Russell Bryant
reviewboard at asterisk.org
Thu Nov 18 14:50:37 CST 2010
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/1018/#review2936
-----------------------------------------------------------
Ship it!
Looks sane to me
- Russell
On 2010-11-18 14:21:22, Brett Bryant wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/1018/
> -----------------------------------------------------------
>
> (Updated 2010-11-18 14:21:22)
>
>
> Review request for Asterisk Developers.
>
>
> Summary
> -------
>
> When setqueuevar is enabled in app_queue, the potential exists for one thread to lock a call_queue structure with a channel lock held while another thread locks the channel with the call_queue lock held. The resulting deadlock will end up crippling Asterisk since quite a few parts of Asterisk will traverse the channel list.
>
> At several points in app_queue (in my case, end_bridge_callback), a queue will be locked before calling set_queue_variables (which pulls data directly out of the queue structure); this is fine, but when setqueuevar is actually enabled, the call to pbx_builtin_setvar_multiple will lock the channel. Meanwhile, another thread may be calling ast_do_masquerade() to clone the same channel, resulting in a lock on the same channel (clonechan.) Later in the masquerade, the datastore fixup functions are called, and queue_transfer_fixup's call to update_queue() will result in an attempt to lock the related queue (or all queues if shared_lastcall is enabled.)
>
> This deadlock occurred for me on 1.6.2.6 but appears to present in 1.6.2.13 as well. In my case, do_masquerade() got the channel lock while end_bridge_callback() got the queue lock.
>
> Queue config:
> setqueuevar = yes
> shared_lastcall = yes
>
> All agents in the queue were Local channels. Unfortunately, I lost the core I forced from this deadlock, so I can't provide backtraces for the two deadlocked threads. I can tell you that they were (roughly)...
>
> Thread A: ... -> ast_bridge_call -> app_queue.c end_bridge_callback [queue lock happens here] -> set_queue_variables -> pbx_builtin_setvar_multiple -> pbx_builtin_setvar_helper -> ast_channel_lock
>
> Thread B: ... -> ast_do_masquerade [channel lock happens here] -> queue_transfer_fixup -> update_queue -> ao2_lock
>
>
> This addresses bug 18031.
> https://issues.asterisk.org/view.php?id=18031
>
>
> Diffs
> -----
>
> /trunk/apps/app_queue.c 294084
>
> Diff: https://reviewboard.asterisk.org/r/1018/diff
>
>
> Testing
> -------
>
> Queue testsuite tests ran before and after applying the patch.
>
>
> --> queues/queue_baseline --- PASSED
> --> queues/position_priority_maxlen --- PASSED
> --> queues/macro_gosub_test --- PASSED
>
>
> Thanks,
>
> Brett
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.digium.com/pipermail/asterisk-dev/attachments/20101118/a9dd45ee/attachment.htm
More information about the asterisk-dev
mailing list