[asterisk-dev] [Code Review] Fix a deadlock relating to autoservice. Issue 14057
Mark Michelson
mmichelson at digium.com
Thu Dec 11 09:19:17 CST 2008
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviewboard.digium.com/r/83/#review188
-----------------------------------------------------------
I realize that in the paragraph where I describe the deadlock, I failed to mention that the autoservice thread is attempting to lock the channel via the ast_waitfor_n call. This is why the check for lockability was added prior to the ast_waitfor_n call.
- Mark
On 2008-12-11 09:14:52, Mark Michelson wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviewboard.digium.com/r/83/
> -----------------------------------------------------------
>
> (Updated 2008-12-11 09:14:52)
>
>
> Review request for Asterisk Developers.
>
>
> Summary
> -------
>
> This is a potential fix for issue 14057 on Mantis (http://bugs.digium.com/view.php?id=14057)
>
> The problem presented is that a thread which holds a lock on a channel is attempting to stop autoservice on that channel. Meanwhile, the autoservice thread is attempting to lock the channel held by the thread attempting to stop autoservice. Unfortunately the thread attempting to stop autoservice cannot complete since it must wait for the current iteration of the autoservice loop to finish so that the as_chan_list_state can change.
>
> I have added a check prior to the call to ast_waitfor_nandfds which loops through the channels currently in autoservice and attempts to lock them. If the lock is unsuccessful, then we remove the channel from the array of channels to poll for the current iteration of autoservice.
>
> There are three pieces of feedback I am looking for from this review request (although I would welcome other forms of feedback, too)
> 1. This sort of fix makes perfect sense for a channel stopping autoservice whose autoservice use count will be reduced to 0 once the call to ast_autoservice_stop completes. However, the possibility also exists that ast_autoservice_stop will only be decrementing the use count of the channel and not removing it from the aslist for the time being. In this scenario, I'm wondering if my fix is actually counterproductive since it could potentially mean that a queue of frames will build on the channel as long as the channel's lock is held by another thread, essentially meaning that autoservice is doing nothing.
>
> 2. Is there still a race condition here? When the ast_channel_trylock() call succeeds, my current action is to immediately unlock the channel. I'm wondering if the lock should actually be held until after ast_wait_for_n returns, since it is possible that the channel's lock may be claimed by another thread in the small time margin between testing for lockability and the actual call to ast_waitfor_n.
>
> 3. As a sanity check, make sure that the memmove call is correct. If it is, I may be tempted to add some sort of ARRAY_REMOVE macro which could be used here and in other places where we are using memmove to essentially remove an element from an array.
>
>
> This addresses bug 14057.
> http://bugs.digium.com/view.php?id=14057
>
>
> Diffs
> -----
>
> /branches/1.4/main/autoservice.c 162886
>
> Diff: http://reviewboard.digium.com/r/83/diff
>
>
> Testing
> -------
>
> Since this particular deadlock is not one I know how to reproduce, I just ran some calls through some known code that uses autoservice. In my tests, things worked correctly, but I'm fairly certain I never actually hit the case where the memmove was necessary.
>
>
> Thanks,
>
> Mark
>
>
More information about the asterisk-dev
mailing list