[asterisk-bugs] [JIRA] (ASTERISK-30081) app_confbridge: Channel can join wrong bridge

N A (JIRA) noreply at issues.asterisk.org
Wed May 25 11:47:48 CDT 2022


     [ https://issues.asterisk.org/jira/browse/ASTERISK-30081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

N A updated ASTERISK-30081:
---------------------------

    Description: 
This doesn't happen all the time, but it happens often enough. The actual timing seems to be random - possible race condition.

This reproduces very frequently - not every time, but about half the time or more - with this scenario:

- A and B are in a confbridge
- All in a split second, but in this order:
   - A leaves the bridge
   - B gets channel redirected out of the bridge
   - (confbridge is now empty, ceases to exist momentarily)
   - A re-enters the bridge.
   - C enters the bridge.

It works correctly when the timing is different, and there ISN'T a brief moment where the bridge is "EMPTY" in the middle. Therefore, you ONLY see this in a trace for a bad, broken bridge:

{noformat}
[2022-05-25 15:32:49] DEBUG[23368][C-0000002a]: app_confbridge.c:1570 alloc_playback_chan: Created announcer channel 'CBAnn/oe7-0000007a;1' to conference bridge 'oe7'
[2022-05-25 15:32:49]     -- Channel CBAnn/oe7-0000007a;2 joined 'softmix' base-bridge <66c254c2-ef39-4252-9cc4-405fb58d1bbc>
{noformat}

I have several call debugs of "good" and "bad" calls that broke, and only in the bad/broken calls can I find "EMPTY" in the app_confbridge debug messages.

C ends up in the bridge correctly, but A ends up in the wrong bridge.

Two channels are supposed to get into the same ConfBridge, but they each end up by themselves, obviously not bridged together.

core show channels shows them both executing the ConfBridge application on the same ConfBridge, but only one channel is in the bridge according to confbridge list and confbridge show.

A backtrace confirms the other channel has somehow entered a *different* bridge by itself.

On occasion, channels joining the same ConfBridge at nearly the same time appear to get into different bridges as opposed to the same bridge, even though they have been requested to go into the same ConfBridge.

A backtrace confirms that for channels with the same confbridge name, confbridge->bridge is different for them here (they should be the same):

https://github.com/asterisk/asterisk/blob/master/apps/app_confbridge.c#L2848

I am marking this "major" since such a race condition can completely break the usage of ConfBridge, and suggests that perhaps the locking here is insufficient: https://github.com/asterisk/asterisk/blob/master/apps/app_confbridge.c#L1736

CLI trace showing bizarre confbridge/bridge/channel disonnance:

https://paste.interlinked.us/zq10e1w9qo.txt

Output of bridge show:

{noformat}

*CLI> bridge show
17378691-4657-41df-a2ba-ff871b5cda79  2916a9dc-9bad-4aea-8fae-fc4ace5ea748  3022d1db-2659-4dad-988f-e8e6994641bb  all                                   de65981b-7f2d-420c-8ce4-e2c6120953dd

[2022-05-25 14:24:28]     -- Remote UNIX connection
[2022-05-25 14:24:28]     -- Remote UNIX connection disconnected
*CLI> bridge show 17378691-4657-41df-a2ba-ff871b5cda79
Id: 17378691-4657-41df-a2ba-ff871b5cda79
Type: base
Technology: softmix
Subclass: base
Creator: ConfBridge
Name: oe6
Video-Mode: none
Video-Source-Id:
Num-Channels: 3
Num-Active: 3
Duration: 00:31:02
Channel: PJSIP/ATAxLB1-00000016
Channel: CBAnn/oe6-00000048;2
Channel: Local/2129 at originate-local-00000049;1

*CLI> bridge show 2916a9dc-9bad-4aea-8fae-fc4ace5ea748
Id: 2916a9dc-9bad-4aea-8fae-fc4ace5ea748
Type: base
Technology: softmix
Subclass: base
Creator: ConfBridge
Name: oe7
Video-Mode: none
Video-Source-Id:
Num-Channels: 2
Num-Active: 2
Duration: 00:30:49
Channel: Local/4002702129 at lines-pre-0000004a;2
Channel: CBAnn/oe7-0000004d;2

*CLI> bridge show 3022d1db-2659-4dad-988f-e8e6994641bb
Id: 3022d1db-2659-4dad-988f-e8e6994641bb
Type: basic
Technology: simple_bridge
Subclass: basic
Creator:
Name:
Video-Mode: none
Video-Source-Id:
Num-Channels: 2
Num-Active: 2
Duration: 00:30:51
Channel: Local/2129 at originate-local-00000049;2
Channel: Local/4002702129 at lines-pre-0000004a;1

*CLI> bridge show de65981b-7f2d-420c-8ce4-e2c6120953dd
Id: de65981b-7f2d-420c-8ce4-e2c6120953dd
Type: base
Technology: softmix
Subclass: base
Creator: ConfBridge
Name: oe7
Video-Mode: none
Video-Source-Id:
Num-Channels: 2
Num-Active: 2
Duration: 00:30:59
Channel: PJSIP/ATAxLB2-00000015
Channel: CBAnn/oe7-0000004c;2
{noformat}

  was:
This doesn't happen all the time, but it happens often enough. The actual timing seems to be random - possible race condition.

This reproduces very frequently - not every time, but about half the time or more - with this scenario:

- A and B are in a confbridge
- All in a split second, but in this order:
   - A leaves the bridge
   - B gets channel redirected out of the bridge
   - (confbridge is now empty, ceases to exist momentarily)
   - A re-enters the bridge.
   - C enters the bridge.

It works correctly when the timing is different, and there ISN'T a brief moment where the bridge is "EMPTY" in the middle.

I have several call debugs of "good" and "bad" calls that broke, and only in the bad/broken calls can I find "EMPTY" in the app_confbridge debug messages.

C ends up in the bridge correctly, but A ends up in the wrong bridge.

Two channels are supposed to get into the same ConfBridge, but they each end up by themselves, obviously not bridged together.

core show channels shows them both executing the ConfBridge application on the same ConfBridge, but only one channel is in the bridge according to confbridge list and confbridge show.

A backtrace confirms the other channel has somehow entered a *different* bridge by itself.

On occasion, channels joining the same ConfBridge at nearly the same time appear to get into different bridges as opposed to the same bridge, even though they have been requested to go into the same ConfBridge.

A backtrace confirms that for channels with the same confbridge name, confbridge->bridge is different for them here (they should be the same):

https://github.com/asterisk/asterisk/blob/master/apps/app_confbridge.c#L2848

I am marking this "major" since such a race condition can completely break the usage of ConfBridge, and suggests that perhaps the locking here is insufficient: https://github.com/asterisk/asterisk/blob/master/apps/app_confbridge.c#L1736

CLI trace showing bizarre confbridge/bridge/channel disonnance:

https://paste.interlinked.us/zq10e1w9qo.txt

Output of bridge show:

{noformat}

*CLI> bridge show
17378691-4657-41df-a2ba-ff871b5cda79  2916a9dc-9bad-4aea-8fae-fc4ace5ea748  3022d1db-2659-4dad-988f-e8e6994641bb  all                                   de65981b-7f2d-420c-8ce4-e2c6120953dd

[2022-05-25 14:24:28]     -- Remote UNIX connection
[2022-05-25 14:24:28]     -- Remote UNIX connection disconnected
*CLI> bridge show 17378691-4657-41df-a2ba-ff871b5cda79
Id: 17378691-4657-41df-a2ba-ff871b5cda79
Type: base
Technology: softmix
Subclass: base
Creator: ConfBridge
Name: oe6
Video-Mode: none
Video-Source-Id:
Num-Channels: 3
Num-Active: 3
Duration: 00:31:02
Channel: PJSIP/ATAxLB1-00000016
Channel: CBAnn/oe6-00000048;2
Channel: Local/2129 at originate-local-00000049;1

*CLI> bridge show 2916a9dc-9bad-4aea-8fae-fc4ace5ea748
Id: 2916a9dc-9bad-4aea-8fae-fc4ace5ea748
Type: base
Technology: softmix
Subclass: base
Creator: ConfBridge
Name: oe7
Video-Mode: none
Video-Source-Id:
Num-Channels: 2
Num-Active: 2
Duration: 00:30:49
Channel: Local/4002702129 at lines-pre-0000004a;2
Channel: CBAnn/oe7-0000004d;2

*CLI> bridge show 3022d1db-2659-4dad-988f-e8e6994641bb
Id: 3022d1db-2659-4dad-988f-e8e6994641bb
Type: basic
Technology: simple_bridge
Subclass: basic
Creator:
Name:
Video-Mode: none
Video-Source-Id:
Num-Channels: 2
Num-Active: 2
Duration: 00:30:51
Channel: Local/2129 at originate-local-00000049;2
Channel: Local/4002702129 at lines-pre-0000004a;1

*CLI> bridge show de65981b-7f2d-420c-8ce4-e2c6120953dd
Id: de65981b-7f2d-420c-8ce4-e2c6120953dd
Type: base
Technology: softmix
Subclass: base
Creator: ConfBridge
Name: oe7
Video-Mode: none
Video-Source-Id:
Num-Channels: 2
Num-Active: 2
Duration: 00:30:59
Channel: PJSIP/ATAxLB2-00000015
Channel: CBAnn/oe7-0000004c;2
{noformat}


> app_confbridge: Channel can join wrong bridge
> ---------------------------------------------
>
>                 Key: ASTERISK-30081
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-30081
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Applications/app_confbridge
>    Affects Versions: 18.9.0
>            Reporter: N A
>            Severity: Major
>         Attachments: app_confbridge.c, wyeed8jbjr.txt, zq10e1w9qo.txt
>
>
> This doesn't happen all the time, but it happens often enough. The actual timing seems to be random - possible race condition.
> This reproduces very frequently - not every time, but about half the time or more - with this scenario:
> - A and B are in a confbridge
> - All in a split second, but in this order:
>    - A leaves the bridge
>    - B gets channel redirected out of the bridge
>    - (confbridge is now empty, ceases to exist momentarily)
>    - A re-enters the bridge.
>    - C enters the bridge.
> It works correctly when the timing is different, and there ISN'T a brief moment where the bridge is "EMPTY" in the middle. Therefore, you ONLY see this in a trace for a bad, broken bridge:
> {noformat}
> [2022-05-25 15:32:49] DEBUG[23368][C-0000002a]: app_confbridge.c:1570 alloc_playback_chan: Created announcer channel 'CBAnn/oe7-0000007a;1' to conference bridge 'oe7'
> [2022-05-25 15:32:49]     -- Channel CBAnn/oe7-0000007a;2 joined 'softmix' base-bridge <66c254c2-ef39-4252-9cc4-405fb58d1bbc>
> {noformat}
> I have several call debugs of "good" and "bad" calls that broke, and only in the bad/broken calls can I find "EMPTY" in the app_confbridge debug messages.
> C ends up in the bridge correctly, but A ends up in the wrong bridge.
> Two channels are supposed to get into the same ConfBridge, but they each end up by themselves, obviously not bridged together.
> core show channels shows them both executing the ConfBridge application on the same ConfBridge, but only one channel is in the bridge according to confbridge list and confbridge show.
> A backtrace confirms the other channel has somehow entered a *different* bridge by itself.
> On occasion, channels joining the same ConfBridge at nearly the same time appear to get into different bridges as opposed to the same bridge, even though they have been requested to go into the same ConfBridge.
> A backtrace confirms that for channels with the same confbridge name, confbridge->bridge is different for them here (they should be the same):
> https://github.com/asterisk/asterisk/blob/master/apps/app_confbridge.c#L2848
> I am marking this "major" since such a race condition can completely break the usage of ConfBridge, and suggests that perhaps the locking here is insufficient: https://github.com/asterisk/asterisk/blob/master/apps/app_confbridge.c#L1736
> CLI trace showing bizarre confbridge/bridge/channel disonnance:
> https://paste.interlinked.us/zq10e1w9qo.txt
> Output of bridge show:
> {noformat}
> *CLI> bridge show
> 17378691-4657-41df-a2ba-ff871b5cda79  2916a9dc-9bad-4aea-8fae-fc4ace5ea748  3022d1db-2659-4dad-988f-e8e6994641bb  all                                   de65981b-7f2d-420c-8ce4-e2c6120953dd
> [2022-05-25 14:24:28]     -- Remote UNIX connection
> [2022-05-25 14:24:28]     -- Remote UNIX connection disconnected
> *CLI> bridge show 17378691-4657-41df-a2ba-ff871b5cda79
> Id: 17378691-4657-41df-a2ba-ff871b5cda79
> Type: base
> Technology: softmix
> Subclass: base
> Creator: ConfBridge
> Name: oe6
> Video-Mode: none
> Video-Source-Id:
> Num-Channels: 3
> Num-Active: 3
> Duration: 00:31:02
> Channel: PJSIP/ATAxLB1-00000016
> Channel: CBAnn/oe6-00000048;2
> Channel: Local/2129 at originate-local-00000049;1
> *CLI> bridge show 2916a9dc-9bad-4aea-8fae-fc4ace5ea748
> Id: 2916a9dc-9bad-4aea-8fae-fc4ace5ea748
> Type: base
> Technology: softmix
> Subclass: base
> Creator: ConfBridge
> Name: oe7
> Video-Mode: none
> Video-Source-Id:
> Num-Channels: 2
> Num-Active: 2
> Duration: 00:30:49
> Channel: Local/4002702129 at lines-pre-0000004a;2
> Channel: CBAnn/oe7-0000004d;2
> *CLI> bridge show 3022d1db-2659-4dad-988f-e8e6994641bb
> Id: 3022d1db-2659-4dad-988f-e8e6994641bb
> Type: basic
> Technology: simple_bridge
> Subclass: basic
> Creator:
> Name:
> Video-Mode: none
> Video-Source-Id:
> Num-Channels: 2
> Num-Active: 2
> Duration: 00:30:51
> Channel: Local/2129 at originate-local-00000049;2
> Channel: Local/4002702129 at lines-pre-0000004a;1
> *CLI> bridge show de65981b-7f2d-420c-8ce4-e2c6120953dd
> Id: de65981b-7f2d-420c-8ce4-e2c6120953dd
> Type: base
> Technology: softmix
> Subclass: base
> Creator: ConfBridge
> Name: oe7
> Video-Mode: none
> Video-Source-Id:
> Num-Channels: 2
> Num-Active: 2
> Duration: 00:30:59
> Channel: PJSIP/ATAxLB2-00000015
> Channel: CBAnn/oe7-0000004c;2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list