[asterisk-dev] [Code Review] A chan_local with no busy loop deadlock avoidance (and no deadlocks, too)
Mark Michelson
mmichelson at digium.com
Thu Feb 11 14:22:13 CST 2010
> On 2009-04-16 07:53:26, Brad Latus wrote:
> > Just took a look over coding guidelines wise..
>
> Mark Michelson wrote:
> Thanks for taking a look at this. I'm not going to be posting any new diffs on this review until the idea of the frame list lock is pursued, though. Until then, I'm not really accomplishing anything here.
>
> Tilghman Lesher wrote:
> What is currently holding up this patch?
What's holding this up was the suggestion of having a separate lock on an ast_channel for queuing frames. Damien Wedhorn started working on a branch at one point that had a frame queue lock on the channel which would be used for queueing frames and reading them. This work is tricky, though, because it adds new locks and needs to be done very carefully.
I just had an idea for a change which is similar in nature to what I have here in this review, but more generic. In this review, I created a taskprocessor in chan_local for queuing frames from one portion of a local channel to the other. This moved the locking of the channel to queue a frame into a separate thread from the one which currently had a local_pvt and local channel locked. We could have a singleton taskprocessor in the Asterisk core whose sole job is to queue frames onto channels. Whenever ast_queue_frame is called, we simply queue a task into the taskprocessor. Then, the taskprocessor will run the frame queuing operation in a separate thread, where the channel can be safely locked.
The upside to this approach is that it is easier to implement than the idea of having a new lock on the ast_channel. There are a few downsides that I can think of:
1. On a very busy system, it may be that frames end up getting delivered much later than they really should be. I don't think this will be much of a problem since, unless there is some other unrelated issue, most frame queues only ever have a frame or two on them at any given time.
2. The return value of ast_queue_frame and its kin will have a different meaning than it used to. You won't actually know from its return if the frame was successfully queued onto the channel. All you can know is if the taskprocessor has properly had a new task added to it.
- Mark
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/219/#review700
-----------------------------------------------------------
On 2009-04-11 18:01:50, Mark Michelson wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/219/
> -----------------------------------------------------------
>
> (Updated 2009-04-11 18:01:50)
>
>
> Review request for Asterisk Developers.
>
>
> Summary
> -------
>
> chan_local is notorious for its ability to deadlock. Some functions in chan_local consist mainly of deadlock avoidance using busy loop constructs.
>
> Upon inspecting the code closely, I found that local_queue_frame and local_hangup were the two functions which were the worst as far as this construct is concerned. This is because these two functions were attempting to lock two channels as well as the local_pvt. What I noticed was that both of these functions really didn't need to synchronously take its actions. It would be fine to push the tasks to another thread and have them taken care of there instead. That's where this patch comes in. With it, I have made use of a task processor in order to avoid all the deadlock avoidance code which had plagued chan_local before. The only place where you'll see deadlock avoidance used is in local_call, because we really do need to have both channels locked there.
>
> I made this patch against the ast_channel_ao2 branch because I found that it was not possible to implement all the fixes I wanted to without the use of ref-counted channels. The problem was that the channel thread could free the channel out from under a task processor execution. With ref-counted channels, you don't have to worry about that at all since we can prevent the channel from being released until all references to it are removed. I found while working on this that I could also get rid of the LOCAL_GLARE_DETECT and LOCAL_CANCEL_QUEUE flags.
>
>
> Diffs
> -----
>
> /team/russell/ast_channel_ao2/channels/chan_local.c 188030
>
> Diff: https://reviewboard.asterisk.org/r/219/diff
>
>
> Testing
> -------
>
> I have attempted placing calls to Local channels with and without the /n modifier. I also have attempted operations which use local channels internally, such as attended transfers and call-forwards. I ran both plain and under valgrind and have no errors.
>
> In the interest of full disclosure, I discovered a crash at one point stemming from my use of realtime music on hold. I tested with vanilla trunk and found that the same crash occurred there, too, so it has nothing to do with my changes or the ast_channel_ao2 branch's changes. When I re-ran my tests with statically-configured music on hold, all went well.
>
>
> Thanks,
>
> Mark
>
>
More information about the asterisk-dev
mailing list