[asterisk-dev] [Code Review] 2741: Handle two race conditions while joining a bridge and fix a ref counting error
Matt Jordan
reviewboard at asterisk.org
Thu Aug 8 18:06:25 CDT 2013
> On Aug. 7, 2013, 11:29 p.m., rmudgett wrote:
> > trunk/main/bridge.c, lines 1609-1621
> > <https://reviewboard.asterisk.org/r/2741/diff/3/?file=43825#file43825line1609>
> >
> > In the very unlikely event that the impart thread fails to be created, some other thread could complete the channel push into another bridge and we would never pull it from that bridge.
>
> Matt Jordan wrote:
> I'm unsure of what else is supposed to be done here.
>
> If the thread can't be created, it should have a non-zero return value. If that occurred, the channel never got pushed into the bridge. res should be set to a non-zero value, and the cleanup code after this point should clean up the channel.
So, we looked at this some more.
On further inspection, a bridge move operation can't occur while the thread is being spawned. While a call to ast_bridge_add_channel on a separate thread would cause a bridge_move_locked operation to be attempted, bridge_move_locked will fail as the bridge_channel for the channel being moved is not added to the bridge's list of channels until bridge_channel_internal_push is called. bridge_channel_internal_push is not called until bridge_channel_internal_join is called, which occurs after the thread has successfully spawned.
Whew.
That being said, in retrospect, assigning the ast_bridge pointer to the ast_channel in ast_bridge_impart feels a bit wrong.
(1) The management of ast_bridge on the channel is typically handled in bridge_channel. Here, we set it early in bridge.c but clear it in bridge_channel (for a variety of other reasons). Managing this in two places is cumbersome.
(2) It's better to assign the bridge pointer to the channel only when we know it should get into the bridge. That way, things that check for it using the ast_bridge pointer can be reasonably confident that it is, actually, in the bridge.
(3) It isn't actually necessary. While there is a race condition here that the channel could be masqueraded between the initial call to ast_bridge_impart and the thread being spawned, it can't be moved into another bridge (as we just verified). The masquerade can be checked both in ast_bridge_impart and in bridge_channel_internal_join (while it may not be strictly necessary to check for it in both places, it is nice to do so to fail out as quickly as possible if someone stole the channel)
- Matt
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/2741/#review9353
-----------------------------------------------------------
On Aug. 7, 2013, 9:32 p.m., Matt Jordan wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/2741/
> -----------------------------------------------------------
>
> (Updated Aug. 7, 2013, 9:32 p.m.)
>
>
> Review request for Asterisk Developers and rmudgett.
>
>
> Repository: Asterisk
>
>
> Description
> -------
>
> This patch fixes three problems found using the Bridge AMI action.
>
> The scenario for reproducing the problem was always the same:
> * Originate three Local channels into Echo()
> * Execute Bridge on Local-0;2 and Local-1;2
> * Immediately execute Bridge on Local-1;2 and Local-2;2
>
> This reproduced the following problems:
>
> (1) When originating a channel, the Newchannel event is emitted quickly; however, the channel will not have a pbx thread assigned to it until after the outbound 'dialing' is complete. Thus, there is a period of time where the outside world "knows" of the channel's existence and can influence it but Asterisk has not yet started the dialplan execution thread. If a Bridge AMI action is taken on the channel, the channel appears to be a Dialed channel with no PBX thread; hence, the channel will be imparted into the Bridge. At the same time, another thread that caused the origination will attempt to start a PBX on it. The end result currently is an assertion failure in the Bridging API.
>
> There's no way to prevent AMI from attempting to Bridge a channel immediately after creation; likewise, holding the channel lock through the entire Dial operation is unwise. Instead of treating the presence of a PBX thread as an error, we simply bail out of the adding the channel to the bridge through ast_bridge_impart. The Bridge action will then fail - but we avoid a situation where the channel is both executing a PBX thread and simultaneously being given a separate thread in the bridging system (which would be a "bad thing"). Since imparting a channel with a PBX *can* occur and is not a programming error, the asserts have been removed.
>
> (2) Adding a channel to a bridge consists of two checks: If the channel is in a bridge already, we move the channel using the Bridging API. If not, we steal the channel from its current location and impart it. To know the difference, we check the ast_bridge pointer on the channel. However, if the channel was previously imparted into a Bridge, the ast_bridge pointer is not set on the channel until the impart channel thread has spun up. During that period of time, the channel is unlocked and will appear for all intents and purposes as not being "in" the Bridging API - even though it is just a matter of time until the Bridge takes it on. This patch avoids this race condition by first checking if the channel being imparted is a Zombie - in which case, we bail - and by setting the ast_bridge pointer immediately upon entering the Bridging API. If later actions fail, the ast_bridge pointer is cleared.
>
> (3) bridge_find_channel does not increase the reference count of the ast_bridge_channel object. The RAII_VAR usage in ast_bridge_add_channel thus created a ticking time bomb in whatever bridge the channel moved into, as the destructor for the ast_bridge_channel object would be called.
>
>
> Diffs
> -----
>
> trunk/main/bridge.c 396364
> trunk/main/bridge_channel.c 396364
> trunk/main/features.c 396364
>
> Diff: https://reviewboard.asterisk.org/r/2741/diff/
>
>
> Testing
> -------
>
> The nascent AMI Bridge action test stops crashing constantly.
>
>
> Thanks,
>
> Matt Jordan
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20130808/bf1868ec/attachment-0001.htm>
More information about the asterisk-dev
mailing list