[asterisk-dev] [Code Review] 2741: Handle two race conditions while joining a bridge and fix a ref counting error

svnbot reviewboard at asterisk.org
Mon Aug 12 10:59:23 CDT 2013


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/2741/
-----------------------------------------------------------

(Updated Aug. 12, 2013, 10:59 a.m.)


Status
------

This change has been marked as submitted.


Review request for Asterisk Developers and rmudgett.


Changes
-------

Committed in revision 396543


Repository: Asterisk


Description
-------

This patch fixes three problems found using the Bridge AMI action.

The scenario for reproducing the problem was always the same:
* Originate three Local channels into Echo()
* Execute Bridge on Local-0;2 and Local-1;2
* Immediately execute Bridge on Local-1;2 and Local-2;2

This reproduced the following problems:

(1) When originating a channel, the Newchannel event is emitted quickly; however, the channel will not have a pbx thread assigned to it until after the outbound 'dialing' is complete. Thus, there is a period of time where the outside world "knows" of the channel's existence and can influence it but Asterisk has not yet started the dialplan execution thread. If a Bridge AMI action is taken on the channel, the channel appears to be a Dialed channel with no PBX thread; hence, the channel will be imparted into the Bridge. At the same time, another thread that caused the origination will attempt to start a PBX on it. The end result currently is an assertion failure in the Bridging API.

There's no way to prevent AMI from attempting to Bridge a channel immediately after creation; likewise, holding the channel lock through the entire Dial operation is unwise. Instead of treating the presence of a PBX thread as an error, we simply bail out of the adding the channel to the bridge through ast_bridge_impart. The Bridge action will then fail - but we avoid a situation where the channel is both executing a PBX thread and simultaneously being given a separate thread in the bridging system (which would be a "bad thing"). Since imparting a channel with a PBX *can* occur and is not a programming error, the asserts have been removed.

(2) Adding a channel to a bridge consists of two checks: If the channel is in a bridge already, we move the channel using the Bridging API. If not, we steal the channel from its current location and impart it. To know the difference, we check the ast_bridge pointer on the channel. However, if the channel was previously imparted into a Bridge, the ast_bridge pointer is not set on the channel until the impart channel thread has spun up. During that period of time, the channel is unlocked and will appear for all intents and purposes as not being "in" the Bridging API - even though it is just a matter of time until the Bridge takes it on. This patch avoids this race condition by first checking if the channel being imparted is a Zombie - in which case, we bail - and by setting the ast_bridge pointer immediately upon entering the Bridging API. If later actions fail, the ast_bridge pointer is cleared.

(3) bridge_find_channel does not increase the reference count of the ast_bridge_channel object. The RAII_VAR usage in ast_bridge_add_channel thus created a ticking time bomb in whatever bridge the channel moved into, as the destructor for the ast_bridge_channel object would be called.


Diffs
-----

  /trunk/main/bridge.c 396498 
  /trunk/main/bridge_channel.c 396498 
  /trunk/main/features.c 396498 

Diff: https://reviewboard.asterisk.org/r/2741/diff/


Testing
-------

The nascent AMI Bridge action test stops crashing constantly.


Thanks,

Matt Jordan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20130812/db3a2bd1/attachment.htm>


More information about the asterisk-dev mailing list