[asterisk-dev] [Code Review]: Fix deadlock between subscription event RWLOCK and dialogs container lock in chan_sip. (simplified)

rmudgett reviewboard at asterisk.org
Fri Nov 4 18:06:37 CDT 2011



> On Nov. 4, 2011, 10:19 a.m., David Vossel wrote:
> > /branches/1.8/channels/chan_sip.c, lines 25316-25321
> > <https://reviewboard.asterisk.org/r/1564/diff/2/?file=21606#file21606line25316>
> >
> >     This is fine since we know that the iterator is not malloced when we use iterator_init on a stack allocated iterator, but why do we need to use an iterator at all?  What is the advantage of the iterator over a callback that does the exact same thing with OBJ_NODATA | OBJ_MULTIPLE | OBJ_UNLINK used?  Looking at the ao2_iterator_next() function, it might be more expensive than the callback when bucket collisions exist.
> >     
> >     
> >     I understand we don't hold the dialogs_needdestroy lock during the iterator traversal, but holding that lock won't cause a problem in this case.  It would actually be more efficient to hold the lock for the entire traversal instead of grabbing it and letting it go for each iteration.  Actually, since the dialog_needdestroy linking and unlinking exists in the same thread, we can just pass the iterator the NO_LOCK flag.  No synchronization is necessary... at least for now that is.
> >
> 
> rmudgett wrote:
>     With the change to add an ao2_container_count() test before this block, this code will not normally execute.  When it does execute, I would not expect it to have many dialogs to destroy anyway.  Furthermore, the container only has one bucket and the iterator UNLINK flag is set.  The UNLINK flag causes the iterator to always grab and remove the first object off of the bucket list.  The only real speed up possible here would be to use the iterator DONTLOCK flag because no synchronization is necessary... at least for now that is. :)  Besides holding locks for longer than necessary is not a good thing isn't it. :)
>     
>     This version of dialogs_needdestroy container would be a perfect candidate to be created without a lock using my astobj2 lock enhancement patch. :)
>     
>     FYI: Using OBJ_MULTIPLE with OBJ_NODATA does nothing. The OBJ_MULTIPLE flag is only needed if you want the ao2_callback to return more than one object.
> 
> David Vossel wrote:
>     " Besides holding locks for longer than necessary is not a good thing isn't it."
>     
>     Correct, but the cost of letting a lock go and reacquiring it has a similar potential for impacting performance in some circumstances.
>     
>     For example, if a lock must be held for a single operation, and that operation exists in a loop.  It is is likely more efficient for us to grab the lock once and hold it for the duration of the loop rather than locking/unlocking on each iteration.
>     
>     We are essentially locking/unlocking on each iteration here.  Since no concurrency exists between the dialogs_needdestroy function at the moment it matters less.  Once concurrency does exist for this container we should hold the lock for the duration of the traversal, which would result in using a callback instead of an interator.  This a more future proof approach than the use of the iterator with the NO_LOCK flag.

Dang.  Ignore my ramblings about OBJ_MULTIPLE above.  I was wrong.  You really do need that flag to keep going.


- rmudgett


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/1564/#review4684
-----------------------------------------------------------


On Nov. 4, 2011, 5:28 p.m., rmudgett wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/1564/
> -----------------------------------------------------------
> 
> (Updated Nov. 4, 2011, 5:28 p.m.)
> 
> 
> Review request for Asterisk Developers, David Vossel and schmidts.
> 
> 
> Summary
> -------
> 
> Timing between dialog destruction and a MWI event sending a message could result in a deadlock.
> 
> Order of events causing deadlock:
> 
> 1a) The event subscription system calls the registered callbacks with its list RWLOCK held.
> 1b) The SIP monitor checks for dialogs needing destruction.  It does an ao2_callback that holds the dialogs container lock while searching for dialogs to destroy.
> 2a) The event subscription SIP callback needs to create a temporary dialog to send out the MWI notification.  That temporary dialog needs to be inserted in the dialogs container so it must wait.
> 2b) The dialog search finds a dialog to destroy and as a result releases the last reference for a peer.  The peer destructor attempts to get the subscription RWLOCK but must wait.
> 3) deadlock
> 
> 
> This is based off of the original diff posted for review: https://reviewboard.asterisk.org/r/1557/
> 
> It directly addresses the comment dvossel posted to the original diff of https://reviewboard.asterisk.org/r/1557/
> 
> 
> This addresses bug ASTERISK-18747.
>     https://issues.asterisk.org/jira/browse/ASTERISK-18747
> 
> 
> Diffs
> -----
> 
>   /branches/1.8/channels/chan_sip.c 343487 
> 
> Diff: https://reviewboard.asterisk.org/r/1564/diff
> 
> 
> Testing
> -------
> 
> It compiles. :)
> 
> 
> Thanks,
> 
> rmudgett
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20111104/e853d1dd/attachment-0001.htm>


More information about the asterisk-dev mailing list