[asterisk-dev] [Code Review]: chan_sip REFER deadlock fixes

Tue Aug 2 14:43:19 CDT 2011

> On Aug. 2, 2011, 1:55 p.m., mjordan wrote:
> > /branches/1.8/channels/chan_sip.c, line 22916
> > <https://reviewboard.asterisk.org/r/1339/diff/1/?file=17695#file17695line22916>
> >
> >     It looks like (based on the comments) local_attended_transfer will lock p and p->owner.  If so, then is there a case where p->refer->attendedtransfer is false, and we attempt to unlock p and p->owner when they aren't locked?
> >     
> >     If this isn't the case, and the caller of this method locked p and p->owner... then that's kind of confusing to have a side effect of a method be an unlock.  It may not be possible to refactor that out somehow, but I can see why this has been difficult to maintain.
> >     
> >     Also: is there a reason why this was changed from current.chan1 to p->owner?  Since (I think) they're the same thing, and the next comment says not to hold locks on chan1 (being passed to ast_parking_ext_valid), its not intuitively obvious that the unlock of p->owner accomplishes that.
> >

local_attended_transfer() has a precondition that both p and p->owner are locked before the function is called.  I just made that comment so it would be more obvious that precondition exists in the handle_request_refer code.

local_attended_transfer() may unlock p->owner, and in that case my logic of unlocking it in the code below is invalid... Luckily it turns out this will never happen since it is only unlocked on success and in that case the goto statement is hit below the call to local_attended_transfer().   Regardless it isn't safe.  The unlock of p->owner should only occur if nolock == 0, which resolves the issue of local_attended_transfer possibly unlocking the owner, and it should happen before the pvt is unlocked rather than after.  I will fix this.

> On Aug. 2, 2011, 1:55 p.m., mjordan wrote:
> > /branches/1.8/channels/chan_sip.c, line 22924
> > <https://reviewboard.asterisk.org/r/1339/diff/1/?file=17695#file17695line22924>
> >
> >     See previous comment for consistency between p->owner and current.chan1.

To answer you question about the consistent usage of p->owner and current.chan1, the p->owner pointer is only valid while p is locked.  I had to change this from p->owner to current.chan1 because p's lock can not be held during this function.  We are guaranteed current.chan1 will remain valid the duration of the function because of the channel ref given to it earlier while p is locked...

> On Aug. 2, 2011, 1:55 p.m., mjordan wrote:
> > /branches/1.8/channels/chan_sip.c, line 22928
> > <https://reviewboard.asterisk.org/r/1339/diff/1/?file=17695#file17695line22928>
> >
> >     Can the caller ID change between where you previously extracted it and the localtransfer?  You duplicate it previously, then lock p and reextract it here.

callid in this case is not Caller ID.  This field represents the Call-ID header in SIP messages that is part of uniquely identifying message belonging to a sip dialog.  It will stay consistent for the duration of the sip pvt's life.

> On Aug. 2, 2011, 1:55 p.m., mjordan wrote:
> > /branches/1.8/channels/chan_sip.c, line 22968
> > <https://reviewboard.asterisk.org/r/1339/diff/1/?file=17695#file17695line22968>
> >
> >     Remove XXX (unless we have a standard that XXX is 'TODO', or something like that, and we can't fix the TODO under this issue)

Well, I'm sure that's true.  I do that sometimes to bring attention to a comment that is really important.

- David

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/1339/#review3967
-----------------------------------------------------------

On Aug. 2, 2011, 12:21 p.m., David Vossel wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/1339/
> -----------------------------------------------------------
> 
> (Updated Aug. 2, 2011, 12:21 p.m.)
> 
> 
> Review request for Asterisk Developers.
> 
> 
> Summary
> -------
> 
> handle_request_refer() is a complete mess when it comes to locking.  A deadlock occurs, we fix it, and then it moves somewhere else.  This patch attempts to resolve all the possible locking inversion issues that can occur in this function.
> 
> 
> This addresses bug ASTERISK-18082.
>     https://issues.asterisk.org/jira/browse/ASTERISK-18082
> 
> 
> Diffs
> -----
> 
>   /branches/1.8/channels/chan_sip.c 330578 
> 
> Diff: https://reviewboard.asterisk.org/r/1339/diff
> 
> 
> Testing
> -------
> 
> I tested refer using a snom phone with blind transfer, but that is not very impressive.
> 
> James Van Vleet has tested this code using a load testing tool that was capable of exposing all sorts of problems.  He has reported that his test is running without issue using this iteration of the patch.  Given what it was capable of exposing earlier, I am confident in these test results.
> 
> 
> Thanks,
> 
> David
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20110802/ec8602a4/attachment-0001.htm>