[asterisk-dev] [Code Review]: Fix deadlock between subscription event RWLOCK and dialogs container lock in chan_sip.

rmudgett reviewboard at asterisk.org
Tue Nov 8 13:14:14 CST 2011



> On Nov. 8, 2011, 11:50 a.m., David Vossel wrote:
> > /branches/10/channels/chan_sip.c, lines 11738-11746
> > <https://reviewboard.asterisk.org/r/1557/diff/5/?file=21655#file21655line11738>
> >
> >     Is p locked during these links/unlinks.  If so this is a problem.

I see what you mean.  Deadlock between dialogs_rtpcheck container lock and the dialog p lock.  Actually, p is not always locked here which is itself not good.

The dialogs_rtpcheck container does not care about the p lock.  The use of the dialogs_rtpcheck container is as a subset of the dialogs container.  There are two places where locking order could have been a problem:
1) In dialog_checkrtp_cb() but that function does a trylock instead of a lock.
2) In change_callid_pvt().

Looking at the change_callid_pvt() function will potentially violate locking order between dialogs and dialog/pvt.  It is sometimes called with pvt locked.  If it is not called with pvt locked then it is changing the dialog callid without the lock.  The change_callid_pvt() routine was created from repeated code sequences so it would not be a new deadlock potential.

Locking in chan_sip seems to be really messy. :)


> On Nov. 8, 2011, 11:50 a.m., David Vossel wrote:
> > /branches/10/channels/chan_sip.c, lines 3079-3095
> > <https://reviewboard.asterisk.org/r/1557/diff/5/?file=21655#file21655line3079>
> >
> >     How would it be possible for something to be linked twice into the needdestroy container?  The needdestroy variable should never be set outside of this function, and should definitely never have the potential to be turned off after being set. If this is the case, that is the problem that should be addressed... I see that we do that in sip_hangup.  That is wrong.  We can never guarantee that setting the pvt->needdestroy variable back to 0 after it is already set will stop anything. That is a nasty race condition that no logic should ever depend on.

I did the unlink/link because sig_hangup was setting it to zero.


- rmudgett


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/1557/#review4729
-----------------------------------------------------------


On Nov. 8, 2011, 11:19 a.m., rmudgett wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/1557/
> -----------------------------------------------------------
> 
> (Updated Nov. 8, 2011, 11:19 a.m.)
> 
> 
> Review request for Asterisk Developers, David Vossel and schmidts.
> 
> 
> Summary
> -------
> 
> Timing between dialog destruction and a MWI event sending a message could result in a deadlock.
> 
> Order of events causing deadlock:
> 
> 1a) The event subscription system calls the registered callbacks with its list RWLOCK held.
> 1b) The SIP monitor checks for dialogs needing destruction.  It does an ao2_callback that holds the dialogs container lock while searching for dialogs to destroy.
> 2a) The event subscription SIP callback needs to create a temporary dialog to send out the MWI notification.  That temporary dialog needs to be inserted in the dialogs container so it must wait.
> 2b) The dialog search finds a dialog to destroy and as a result releases the last reference for a peer.  The peer destructor attempts to get the subscription RWLOCK but must wait.
> 3) deadlock
> 
> 
> Residual changes for Asterisk v10 branch after https://reviewboard.asterisk.org/r/1564/ commit
> and associated dialogs callid hash key change fix.
> 
> NOTE THE CHANGE FROM v1.8 to v10 BRANCH.
> 
> * Make check_rtp_timeout() return CMP_MATCH if need to delete dialog from
> dialogs_rtpcheck.  This is an optimization to avoid an unneeded
> lock/unlock and object search when using ao2_unlink.
> 
> * Prevent crash in check_rtp_timeout() if dialog->rtp is NULL.
> 
> * Make pvt_set_needdestroy() protect from possible double entries in
> dialogs_needdestroy.
> 
> Schmidts please note that change_callid_pvt() is different between v1.8
> and v10 for your unleas-the-beast branch.
> 
> 
> Diffs
> -----
> 
>   /branches/10/channels/chan_sip.c 343850 
> 
> Diff: https://reviewboard.asterisk.org/r/1557/diff
> 
> 
> Testing
> -------
> 
> It compiles. :)
> 
> 
> Thanks,
> 
> rmudgett
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20111108/c4d14fd8/attachment-0001.htm>


More information about the asterisk-dev mailing list