[asterisk-dev] [Code Review] 2741: Handle two race conditions while joining a bridge and fix a ref counting error

Matt Jordan reviewboard at asterisk.org
Wed Aug 7 16:23:16 CDT 2013



> On Aug. 7, 2013, 5:38 p.m., rmudgett wrote:
> > trunk/main/bridge.c, lines 2184-2186
> > <https://reviewboard.asterisk.org/r/2741/diff/2/?file=43780#file43780line2184>
> >
> >     Revert this.  This is our channel that we created in ast_channel_yank().

So this can occur in the following exceedingly weird case:

* User Originates a Local Channel. Local;1 begins executing dialplan, Local;2 waits.
* User initiates a Bridge AMI action on Local;2 and some other channel
* The call to ast_channel_yank detects that Local;2 is not in a bridge. It creates the Surrogate and swaps it in on core_local. It hands control over to ast_bridge_add_channel.
* The fixup call in core_local updates the pointers so that it points to the Local;2 channel.
* At this point, we're in a race condition between core_local starting a PBX on the Local;2 and it being added to the bridge. If core_local starts the PBX, ast_bridge_impart will fail.
* We cannot perform a direct hangup on the channel if ast_bridge_impart fails, as we did not completely steal the channel. We have to fail and let it continue to execute dialplan elsewhere.

This does need a ref debump and a comment.


- Matt


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/2741/#review9348
-----------------------------------------------------------


On Aug. 5, 2013, 1:41 a.m., Matt Jordan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/2741/
> -----------------------------------------------------------
> 
> (Updated Aug. 5, 2013, 1:41 a.m.)
> 
> 
> Review request for Asterisk Developers and rmudgett.
> 
> 
> Repository: Asterisk
> 
> 
> Description
> -------
> 
> This patch fixes three problems found using the Bridge AMI action.
> 
> The scenario for reproducing the problem was always the same:
> * Originate three Local channels into Echo()
> * Execute Bridge on Local-0;2 and Local-1;2
> * Immediately execute Bridge on Local-1;2 and Local-2;2
> 
> This reproduced the following problems:
> 
> (1) When originating a channel, the Newchannel event is emitted quickly; however, the channel will not have a pbx thread assigned to it until after the outbound 'dialing' is complete. Thus, there is a period of time where the outside world "knows" of the channel's existence and can influence it but Asterisk has not yet started the dialplan execution thread. If a Bridge AMI action is taken on the channel, the channel appears to be a Dialed channel with no PBX thread; hence, the channel will be imparted into the Bridge. At the same time, another thread that caused the origination will attempt to start a PBX on it. The end result currently is an assertion failure in the Bridging API.
> 
> There's no way to prevent AMI from attempting to Bridge a channel immediately after creation; likewise, holding the channel lock through the entire Dial operation is unwise. Instead of treating the presence of a PBX thread as an error, we simply bail out of the adding the channel to the bridge through ast_bridge_impart. The Bridge action will then fail - but we avoid a situation where the channel is both executing a PBX thread and simultaneously being given a separate thread in the bridging system (which would be a "bad thing"). Since imparting a channel with a PBX *can* occur and is not a programming error, the asserts have been removed.
> 
> (2) Adding a channel to a bridge consists of two checks: If the channel is in a bridge already, we move the channel using the Bridging API. If not, we steal the channel from its current location and impart it. To know the difference, we check the ast_bridge pointer on the channel. However, if the channel was previously imparted into a Bridge, the ast_bridge pointer is not set on the channel until the impart channel thread has spun up. During that period of time, the channel is unlocked and will appear for all intents and purposes as not being "in" the Bridging API - even though it is just a matter of time until the Bridge takes it on. This patch avoids this race condition by first checking if the channel being imparted is a Zombie - in which case, we bail - and by setting the ast_bridge pointer immediately upon entering the Bridging API. If later actions fail, the ast_bridge pointer is cleared.
> 
> (3) bridge_find_channel does not increase the reference count of the ast_bridge_channel object. The RAII_VAR usage in ast_bridge_add_channel thus created a ticking time bomb in whatever bridge the channel moved into, as the destructor for the ast_bridge_channel object would be called.
> 
> 
> Diffs
> -----
> 
>   trunk/main/bridge.c 396109 
>   trunk/main/features.c 396109 
> 
> Diff: https://reviewboard.asterisk.org/r/2741/diff/
> 
> 
> Testing
> -------
> 
> The nascent AMI Bridge action test stops crashing constantly.
> 
> 
> Thanks,
> 
> Matt Jordan
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20130807/c9c02321/attachment.htm>


More information about the asterisk-dev mailing list