[asterisk-dev] [Code Review]: Deadlocks dealing with dialplan hints during reload.

rmudgett reviewboard at asterisk.org
Mon Jul 11 15:57:16 CDT 2011



> On July 8, 2011, 4:01 p.m., David Vossel wrote:
> > I've looked over the code and functionally I don't see any problems.  I'm still having a bit of trouble wrapping my head around why we need the new context_merge_lock though.  If locking order inversion is taking place I don't understand how creating a new lock helps the situation.  Can you clarify in a little more detail how this helps?
> 
> rmudgett wrote:
>     The ast_merge_contexts_and_delete() function restructures the contexts and hints which is why nobody else should be doing anything with those containers when it is doing its job.  Part of what ast_merge_contexts_and_delete() does is to remove all watcher callbacks from an old hint and then put them back into a new merged hint.
>     
>     Since handle_statechange() cannot notify the watchers with the normal locks held because of deadlocks, a new lock is needed to ensure that those two functions do not interfere with each other.  The other functions use the hints container lock to do the same thing.
> 
> David Vossel wrote:
>     Essentially what this patch does is erase any concept of locking order between contexts and hints by introducing a global lock to serialize when those containers can be accessed.  This is a solution, but I'm not convinced it is the right one yet.
>     
>     so, Locking order is (contexts -> hints -> hint)
>     
>     The source of this problem occurs at handle_statechange().  There we grab the hints and individual hint locks while calling the state change callback.  I'm guessing that within a callback somewhere the context lock is locked which invalidates locking order. (hints -> hint -> contexts).
>     
>     The right fix given the current architecture is to never call anything in the statechange callback that will result in the global context lock being held.  This can be hard to guarantee though, and will likely occur again in the future given the complexity involved in some of those callbacks.  Can we somehow remove the need to hold the hint/hints lock during the state change callback?

1) The patch does not erase the locking order concept for contexts -> hints -> hint.  The new lock _only_ searializes access to the containers between handle_statechange() and ast_merge_contexts_and_delete() because ast_merge_contexts_and_delete() could conceivably radically change the links in the data structures.  You could see it as changing the locking order to: new-lock -> contexts -> hints -> hint.  However, the new lock only affects handle_statechange() and ast_merge_contexts_and_delete(); no other function.  The ast_merge_contexts_and_delete() is only called during a module reload.

2) The handle_statechange() no longer holds those locks when notifying the watchers with this patch because of the catch-22 stated in the review description.  So yes, the need to hold the hints/hint lock during the watcher notification callbacks is removed.  One of the deadlocks (ASTERISK-16961) happen because of an interaction between another locking order group: contexts -> hints -> hint and channels -> channel -> channel private.

3) The handle_statechange() is effectively a thread since it is a callback of the task processor for pbx.c.


- rmudgett


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/1313/#review3844
-----------------------------------------------------------


On July 8, 2011, 11:01 a.m., rmudgett wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/1313/
> -----------------------------------------------------------
> 
> (Updated July 8, 2011, 11:01 a.m.)
> 
> 
> Review request for Asterisk Developers, Russell Bryant and David Vossel.
> 
> 
> Summary
> -------
> 
> There are two remaining different deadlocks reported dealing with dialplan
> hints.
> 
> The deadlock in ASTERISK-17666 is caused by invalid locking order in
> ast_remove_hint().  The hints container must be locked before the hint
> object.
> 
> The deadlock in ASTERISK-17760 is caused by a catch-22 situation in
> handle_statechange().  The deadlock is caused by not having the conlock
> before calling the watcher callbacks.  Unfortunately, having that lock
> causes a different deadlock as reported in ASTERISK-16961.
> 
> * Fixed ast_remove_hint() locking order.
> 
> * Made handle_statechange() no longer call the watcher callbacks holding
> any locks that matter.
> 
> * Made hint ao2 destructor do the watcher callbacks for extension
> deactivation to guarantee that they get called.
> 
> * Fixed hint reference leak in ast_add_hint() if the callback container
> constructor failed.
> 
> * Fixed hint reference leak in complete_core_show_hint() for every hint it
> found for CLI tab completion.
> 
> * Adjusted locking in ast_merge_contexts_and_delete() for safety.
> 
> * Added context_merge_lock to prevent ast_merge_contexts_and_delete() and
> handle_statechange() from interfering with each other.
> 
> 
> This addresses bugs ASTERISK-17666 and ASTERISK-17760.
>     https://issues.asterisk.org/jira/browse/ASTERISK-17666
>     https://issues.asterisk.org/jira/browse/ASTERISK-17760
> 
> 
> Diffs
> -----
> 
>   /branches/1.8/main/pbx.c 327043 
> 
> Diff: https://reviewboard.asterisk.org/r/1313/diff
> 
> 
> Testing
> -------
> 
> * Did some calls that changed the state of some hints.
> * Reloaded the dialplan.
> 
> 
> Thanks,
> 
> rmudgett
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20110711/946247ae/attachment-0001.htm>


More information about the asterisk-dev mailing list