[asterisk-dev] [Code Review]: AMI module reload causes deadlock.

Mark Michelson reviewboard at asterisk.org
Mon Mar 19 09:51:19 CDT 2012



> On March 16, 2012, 4:47 p.m., Mark Michelson wrote:
> > I don't understand the point of counting the number of active function calls. Isn't this already taken care of inherently by the refcounting done on the actions? I fully understand not holding the action lock while the function call takes place, though.
> 
> rmudgett wrote:
>     The reference count is not quite the same thing.  The reference count is keeping track of how many references to the ao2 object.  The active_count is keeping track of how many of those references are actually calling the callback function.  The ast_manager_unregister() is usually called during a module unload so it is not a good thing if any of those references are still in the callback function when ast_manager_unregister() returns.
>     
>     Locking the ao2 object guaranteed that there were no active callbacks that mattered when ast_manager_unregister() was called.  Unfortunately, this causes the deadlock situation.  The patch stops locking the ao2 object to allow multiple threads to invoke the callback re-entrantly.  There is no way to guarantee a module unload will not crash because of an active callback.  The code attempts to minimize the chance with the registered flag and the maximum 5 second delay before ast_manager_unregister() returns.

So the concern is not refcounts for the action. Rather, if an ast_manager_unregister() returns when there is still an active action, this may result in a module unloading while code within that module is still executing.

The thing that bothers me about this patch is the arbitrary 5-second busy loop. If an active action takes longer than 5 seconds to complete, then a crash may still occur, and it will be even more difficult to pinpoint the cause than it may have prior to this change.

The "Real Fix(tm)" would be to change every manager action in every module to increment the module refcount when starting the action and then to decrement it when the action completes. That way if a manager action were still active in that module, then the module would wait to unload until all current actions were complete. Of course, such a change would be horrifically large and potentially error-prone, so I can understand why you would go for a simpler one that can be contained just within manager.c I'm having a great deal of difficulty thinking of a way to do this, though. Even if module refcounting were handled within the manager.c code, you are at a disadvantage that registering a manager action does not pass along which module is registering the action, plus not all manager actions are registered from loadable modules.

You also can't reasonably keep track of what thread is executing a manager action because an action can be executed from multiple threads at once...

I guess I'll have to give a grumbly bad-natured ship it to this because it's the least invasive, most likely to succeed, patch. I'd suggest we make some sort of issue in JIRA to get this fixed a proper way though, with less busy loops and non-determinism.


- Mark


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/1818/#review5830
-----------------------------------------------------------


On March 15, 2012, 6:13 p.m., rmudgett wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/1818/
> -----------------------------------------------------------
> 
> (Updated March 15, 2012, 6:13 p.m.)
> 
> 
> Review request for Asterisk Developers.
> 
> 
> Summary
> -------
> 
> Steps to invoke the deadlock:
> 1) Invoke an AMI Command action such as "dialplan reload" or "originate".
> 2) The command then causes an external script to request another AMI Command action.
> 3) Deadlock because the current code only allows each registered AMI action to have one active instance at a time.
> 
> This patch avoids the deadlock and tries to keep a module unload from potentially causing a crash by an active AMI action using that module.
> 
> 
> This addresses bug ASTERISK-19487.
>     https://issues.asterisk.org/jira/browse/ASTERISK-19487
> 
> 
> Diffs
> -----
> 
>   /branches/1.8/include/asterisk/manager.h 359708 
>   /branches/1.8/main/manager.c 359708 
> 
> Diff: https://reviewboard.asterisk.org/r/1818/diff
> 
> 
> Testing
> -------
> 
> Code inspection says it should work and the compiler agrees. :)
> 
> 
> Thanks,
> 
> rmudgett
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20120319/9b9f80ec/attachment.htm>


More information about the asterisk-dev mailing list