[asterisk-dev] [Code Review] 2663: Fix race condition in basic bridge when multiple parties leave a multi-party bridge

Mark Michelson reviewboard at asterisk.org
Tue Jul 9 10:44:00 CDT 2013


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/2663/
-----------------------------------------------------------

(Updated July 9, 2013, 3:43 p.m.)


Review request for Asterisk Developers and rmudgett.


Changes
-------

After a discussion with Richard, my first approach, while solving some issues may cause different issues to occur. Since I was setting a flag on a bridge channel when pulled, if that bridge channel were then pushed into another bridge, that flag would persist.  This is not desirable.

So I've approached this anew. The new fix is much simpler than the previous one. Now the hangup hook is the only thing that is altered. Rather than relying on the count of the channels in the bridge, we iterate over the remaining bridge channels and check their state. The ones that are still in AST_BRIDGE_CHANNEL_STATE_WAIT are counted toward the total number of remaining channels. The previous logic for setting the bridge channel state remains if the number of remaining channels is greater than two.


Bugs: ASTERISK-21882
    https://issues.asterisk.org/jira/browse/ASTERISK-21882


Repository: Asterisk


Description
-------

This fixes a race condition in the basic bridge that could result in the bridge not being dissolved as expected if all but one party leaves the bridge simultaneously.

Prior to this change, a hangup hook in the basic bridge would check the number of channels remaining in the bridge and use that number in order to determine what state to set the hanging-up bridge channel to. If there were two or fewer participants remaining, then the hangup hook would do nothing. This would result in the default state of AST_BRIDGE_CHANNEL_STATE_END being set on the leaving bridge channel. Since bridge_basic has the AST_BRIDGE_FLAG_DISSOLVE_HANGUP flag set, it means that if a bridge channel leaves the bridge with AST_BRIDGE_CHANNEL_STATE_END, then the bridge will dissolve. The race condition comes into play during the hangup hook. The problem is that the number of channels in the bridge is not decremented until the bridge channel is pulled from the bridge, which happens after hangup hooks are run. This means that if multiple participants were to hang up simultaneously, leaving a single party in the bridge, it could result in all hangup hooks thinking that there were still more than two channels remaining in the bridge. The result would be that a lone participant would remain in the bridge.

The fix presented here comes from my realization that AST_BRIDGE_FLAG_DISSOLVE_HANGUP is not really a good fit for the basic bridge, because we really only want to dissolve the bridge when a channel hangs up under specific circumstances (i.e. there are two or fewer participants remaining in the bridge). Rather than having a hangup hook set the wheels in motion for having the bridge dissolve, I decided that a better idea would be to trigger bridge dissolution from a pull v_table method in the basic bridge. The race condition is resolved because when the pull method is called, the bridge is locked and the number of channels remaining in the bridge has been decremented. This way, each participant that is pulled from the bridge can be guaranteed to have an accurate count of the remaining parties in the bridge. If, after a pull, there are fewer than two parties remaining in the bridge, the bridge channel that is being pulled will have the AST_BRIDGE_CHANNEL_FLAG_DISSOLVE_HANGUP flag set on it. This will cause the bridge to dissolve.


Diffs (updated)
-----

  /trunk/main/bridging_basic.c 393881 

Diff: https://reviewboard.asterisk.org/r/2663/diff/


Testing
-------

It's difficult to test whether a race condition can still occur or not. This holds especially true in this case where the race condition, as far as I know, was never actually encountered by anyone but was recognized as possible from reading code.

Instead of trying to see if the race condition would still occur, I focused testing on ensuring that typical operation of a basic bridge still worked as expected. I performed a simple two-party call and had a party hang up. I ensured that the bridge dissolved as expected. I also tested an AMI redirect of a channel in a bridge and observed that the bridge dissolved.


Thanks,

Mark Michelson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20130709/60930d62/attachment-0001.htm>


More information about the asterisk-dev mailing list