[asterisk-dev] [Code Review]: Resolve crash from orphaned MWI subscriptions
mjordan
reviewboard at asterisk.org
Tue Dec 6 10:56:17 CST 2011
> On Dec. 6, 2011, 10:33 a.m., David Vossel wrote:
> > /branches/1.8/channels/chan_sip.c, lines 14289-14293
> > <https://reviewboard.asterisk.org/r/1610/diff/1/?file=22117#file22117line14289>
> >
> > I know we talked about this briefly, but I still have reservations about this.
> >
> > Just to be clear for historical reasons. The use of ref counting here does nothing to give the event thread ownership of a reference to the peer. If the event thread does not have a reference handed to it at subscription time, then adding the reference here in the callback will not do anything to prevent the peer from being destroyed as it may have already been destroyed before we even add the ref count.
> >
> > Also, given that the un-subscription to this event occurs during the peer's ao2 destructor callback, we actually run the risk of adding and removing a reference to the peer while it is in the destructor callback... I really don't know what that will do.
> >
> > From what I remember, these lines were added for debugging purposes in order to determine if the peer was already destroyed (we'll get the "bad magic number" error when we try to ref it if it is already destroyed). I'm not sure if this gives us anything new though, as the ao2_lock in sip_send_mwi_to_peer() should offer the same debug information.
I don't think we run the risk of ref'ing the peer here while it is in the destructor callback (or at least while we're in a potentially very dangerous state). They both compete for the event lock.
1. If the destructor callback wins, then it will unsubscribe from the event prior to mwi_event_cb being called (which is its first action)
2. If the event engine wins, then the destructor callback will block until after mwi_event_cb is completely done.
Your latter statement is correct - this was added to try and catch where the peer went bad on us as soon as possible. I'd also note that we care enough about not having the peer disappear on us when using sip_send_mwi_to_peer in handle_request_subscribe that we ref the peer there (which is also probably unnecessary). Thinking through this some more, this doesn't really add much, and may give a false layer of security that isn't correct - so I agree with blowing it out.
In retrospect, it'd be nice if the event callback didn't have a pointer to the peer in the first place, as that'd keep the ownership semantics a bit clearer :-)
- mjordan
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/1610/#review4929
-----------------------------------------------------------
On Dec. 6, 2011, 10:10 a.m., mjordan wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/1610/
> -----------------------------------------------------------
>
> (Updated Dec. 6, 2011, 10:10 a.m.)
>
>
> Review request for Asterisk Developers, David Vossel and opticron.
>
>
> Summary
> -------
>
> ASTERISK-18663 originally manifested as a deadlock when setting 'allowsubscribe=yes', 'callercounter = yes' and setting the subscribecontext in chan_sip. When the deadlock was resolved by r345063, a crash would occur in chan_sip. This would manifest when an MWI notification was to be sent to a peer, but the peer had been deleted due to being dereferenced to a ref count of 0. The root cause of this ended up being the MWI event subscription being resubscribed to in several places, and orphaning the previous event subscription. When an MWI event would occur, all of the event subscriptions (including the orphaned subscriptions) would be notified. This didn't cause any issues until a peer was removed, either by pruning realtime SIP peers, unloading chan_sip, etc. When the peer cleaned itself up, it only removes the subscription that it's aware of - the orphaned subscriptions would continue to exist and, if a new MWI event occurred, would crash Asterisk by referencing the deleted peer.
>
> This patch does several things:
> 1. It resolves the issue in subscribing to the MWI event callback by first unsubscribing the old event subscription
> 2. It more aggressively holds the authpeer in handle_request_subscribe and removes some unneeded peer ref'ing / deref'ing. This was done more for clarity, as the previous location of deref'ing the authpeer ignored that the relatedpeer, set to the authpeer, was still used later in the method
> 3. It fixes a potential bug wherein an authentication result could be positive, but all failures are assumed to be negative
>
>
> This addresses bug ASTERISK-18663.
> https://issues.asterisk.org/jira/browse/ASTERISK-18663
>
>
> Diffs
> -----
>
> /branches/1.8/channels/chan_sip.c 347057
> /branches/1.8/channels/sip/include/sip.h 347057
>
> Diff: https://reviewboard.asterisk.org/r/1610/diff
>
>
> Testing
> -------
>
> Testing was done extensively using 1.8 and 1.8.8.0-rc4. This included using two SIP phones with BLF and MWI subscriptions, with multiple mailboxes defined for various extensions, and module unloading / reloading chan_sip at various times (both before SUBSCRIBE messages were received and after multiple SUBSCRIBE messages had been recevied). The patch was also confirmed to resolve the issue by the issue reporter.
>
>
> Thanks,
>
> mjordan
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20111206/253eb984/attachment-0001.htm>
More information about the asterisk-dev
mailing list