[asterisk-dev] [Code Review]: Fix deadlock between subscription event RWLOCK and dialogs container lock in chan_sip.

schmidts reviewboard at asterisk.org
Thu Nov 3 13:08:43 CDT 2011



> On Nov. 3, 2011, 12:45 p.m., David Vossel wrote:
> > I understand that back-porting the do_monitor loop optimizations from Asterisk 10 resolves this issue, but (correct me if I am wrong) this seems like it is much more than is necessary just to fix the deadlock at hand.  That optimization came before 1.8.0.0 was officially released, but we choose not to include it into 1.8 at that time because 1.8 was already branched and the improvement appeared to be primarily optimization related (the optimization is insanely awesome btw)...  While the optimization looks sane, none of this code has yet to be proven in a production release of Asterisk.   I don't want to run the risk of introducing a regression for something like this, especially when we are in release candidate phase for Asterisk 1.8.8.  We need a simple and solid fix for this blocker that does not introduce any unnecessary complexity.

i only suggest this after your comment about the own container for dialogs which need to be destroyed, cause this part of code is allready written and submitted.
you are right about the reasons why this patches didnt came into 1.8 and i didnt know this issue is a blocker for 1.8.8
if you only add a container for dialogs to destroy then its not necessary to use the full patch cause the rtp timeout would be done in the normal loop over all dialogs.

but if i understand the locking order right as described above you want to remove 2a so the temporary dialog could be inserted into the dialog container. this could only work if you dont iterate over all dialogs and imho without this rtptimeout checks wouldnt work anymore.

i hope i understand something wrong and there is an easier way to solve this cause i really understand why you want a simple fix without any unnecessary complexity.


- schmidts


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/1557/#review4668
-----------------------------------------------------------


On Nov. 2, 2011, 8:25 p.m., rmudgett wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/1557/
> -----------------------------------------------------------
> 
> (Updated Nov. 2, 2011, 8:25 p.m.)
> 
> 
> Review request for Asterisk Developers and David Vossel.
> 
> 
> Summary
> -------
> 
> Timing between dialog destruction and a MWI event sending a message could result in a deadlock.
> 
> Order of events causing deadlock:
> 
> 1a) The event subscription system calls the registered callbacks with its list RWLOCK held.
> 1b) The SIP monitor checks for dialogs needing destruction.  It does an ao2_callback that holds the dialogs container lock while searching for dialogs to destroy.
> 2a) The event subscription SIP callback needs to create a temporary dialog to send out the MWI notification.  That temporary dialog needs to be inserted in the dialogs container so it must wait.
> 2b) The dialog search finds a dialog to destroy and as a result releases the last reference for a peer.  The peer destructor attempts to get the subscription RWLOCK but must wait.
> 3) deadlock
> 
> 
> This addresses bug ASTERISK-18747.
>     https://issues.asterisk.org/jira/browse/ASTERISK-18747
> 
> 
> Diffs
> -----
> 
>   /branches/1.8/channels/chan_sip.c 343275 
> 
> Diff: https://reviewboard.asterisk.org/r/1557/diff
> 
> 
> Testing
> -------
> 
> It compiles. :)
> 
> 
> Thanks,
> 
> rmudgett
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20111103/3b30fa00/attachment.htm>


More information about the asterisk-dev mailing list