[asterisk-dev] [Code Review]: Call pickup race leaves orphaned channels or crashes.

Alec Davis reviewboard at asterisk.org
Tue Aug 30 12:56:11 CDT 2011



> On Aug. 30, 2011, 3:30 a.m., Alec Davis wrote:
> > Works in all the senarios's that I've reported.

I'm also wondering whether the 'pickup datastore' you had previously added to prevent the race condition of picking up the same extension is now redundant. As I commented it out, and once got the new message "can't setup masquerade" as below;

    -- Starting simple switch on 'DAHDI/35-1'
    -- Executing [823 at family:1] Dial("DAHDI/35-1", "Local/823 at en_phone") in new stack
    -- Called Local/823 at en_phone
    -- Executing [823 at en_phone:1] Dial("Local/823 at en_phone-9f80;2", "SIP/gxp-823,20,rwt") in new stack
  == Using UDPTL CoS mark 5
  == Using SIP RTP CoS mark 5
  == Extension Changed 823[phones] new state Ringing for Notify User gxp-822
    -- Called SIP/gxp-823
    -- Local/823 at en_phone-9f80;1 is ringing
    -- SIP/gxp-823-00000003 is ringing
    -- Local/823 at en_phone-9f80;1 is ringing
  == Using UDPTL CoS mark 5
  == Using UDPTL CoS mark 5
  == Using SIP RTP CoS mark 5
  == Extension Changed 824[phones] new state InUse for Notify User gxp-822
[2011-08-30 19:28:33.606655] NOTICE[3506]: features.c:6978 ast_pickup_call: pickup SIP/gxp-823-00000003 attempt by SIP/gxp-824-00000004
  == Using SIP RTP CoS mark 5
  == Extension Changed 822[phones] new state InUse for Notify User gxp-822
[2011-08-30 19:28:33.622086] NOTICE[3507]: features.c:6978 ast_pickup_call: pickup SIP/gxp-823-00000003 attempt by SIP/gxp-822-00000005
    -- SIP/gxp-824-00000004 answered Local/823 at en_phone-9f80;2
  == Extension Changed 823[phones] new state Idle for Notify User gxp-822
    -- Local/823 at en_phone-9f80;1 stopped sounds
    -- Local/823 at en_phone-9f80;1 answered DAHDI/35-1
  == Spawn extension (en_phone, 823, 1) exited non-zero on 'Local/823 at en_phone-9f80;2'
[2011-08-30 19:28:33.939881] WARNING[3507]: channel.c:5769 __ast_channel_masquerade: Can't setup masquerade. One or both channels is dead. (Local/823 at en_phone-9f80;1<ZOMBIE> <-- SIP/gxp-822-00000005)
[2011-08-30 19:28:33.939937] WARNING[3507]: features.c:7054 ast_do_pickup: Unable to masquerade 'SIP/gxp-822-00000005' into 'SIP/gxp-823-00000003'
[2011-08-30 19:28:33.940023] WARNING[3507]: features.c:6987 ast_pickup_call: pickup Local/823 at en_phone-9f80;1<ZOMBIE> failed by SIP/gxp-822-00000005
> 
> irroot wrote:
>     I think the datastore is still needed as there can be a race condition for a channel when multiple attempts are made to pickup a call in do_pickup after target is unlocked and before the masq its possible to have a channel pick it up again .... setting the DS will cause can_pickup to pass it over ...
>     
>     Have got this patch out at >50 sites about 10 of them were getting orphaned channels regularly and about 5 a day were affected.
> 
> Alec Davis wrote:
>     testing dialplan senarios tested:
>     where 3 sip phones are registered as gxp-822, gxp-823, gxp-824
>     
>     Senario 1: Localchan dial, Local chan pickup.
>       Dial 801 from 2 phones.
>       Dial either 800 or *8 from 2 other phones.
>       Both calls were successfully connected to 1 of each of the channels.
>     
>     Senario 2:
>       Dial 802 or 803 or 804 from 1 phone.
>       Dial *8 from 2 other phones.
>       Success
>     
>     exten => 800,1,NoOp(Debug Localchan pickup)
>     exten => 800,n,Dial(Local/824 at en_pickup&Local/823 at en_pickup)
>     
>     exten => 801,1,NoOp(Debug Localchan Pickup, Ring Phones)
>     exten => 801,n,Dial(Local/823 at en_phone&Local/824 at en_phone)
>     
>     exten => 802,1,Dial(SIP/gxp-823&SIP/gxp-824)
>     exten => 803,1,Dial(DAHDI/33&DAHDI/35)
>     exten => 804,1,Dial(Local/823 at en_phone&Local/824 at en_phone&DAHDI/33&DAHDI/35)
>     
>     [en_pickup]
>     exten => _[0-9*#]!, 1, PickupChan(Local/${EXTEN}@en_phone)
>     
>     [en_phone]
>     exten => _[0-9*#]!, 1, Dial(SIP/gxp-${EXTEN},20,rwt)
>
> 
> rmudgett wrote:
>     The datastore prevents two users from picking up the exact same channel.  The datastore is more graceful than if it were not there because the race changes to who can setup the masquerade first.  The loser then gets a warning message in the log about a masquerade already setup for someone else.
>     
>     The pickup race that this patch fixes is because the call is forked and the users are picking up different legs of the forked call.
>     
>     FYI: One of my test attempts when running under valgrind had both legs of the call "successfully" picked up.  As the answer was propagated to the forking dial, the dial determined the winner of the pickup based on which leg it saw answer first.  The losing pickup attempt just got hung up on.

Agreed, regrading the datastore detects multiple pickups of the exact same channel is tidier, only one message, and the loser doesn't even get to masquerade stage.

re. running under valgrind, the result was correct wasn't it?


- Alec


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/1400/#review4174
-----------------------------------------------------------


On Aug. 29, 2011, 6:51 p.m., rmudgett wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/1400/
> -----------------------------------------------------------
> 
> (Updated Aug. 29, 2011, 6:51 p.m.)
> 
> 
> Review request for Asterisk Developers, Alec Davis and irroot.
> 
> 
> Summary
> -------
> 
> Attempting to pickup a call that has been forked to multiple extensions by two users either crashes or fails a masquerade with a "bad things may happen" message.
> 
> This is the scenario that is causing all the grief:
> 1) Pickup target is selected
> 2) target is marked as being picked up in ast_do_pickup()
> 3) target is unlocked by ast_do_pickup()
> 4) app dial or queue gets a chance to hang up losing calls and calls ast_hangup() on target
> 5) SINCE A MASQUERADE HAS NOT BEEN SETUP YET BY ast_do_pickup() with ast_channel_masquerade(), ast_hangup() completes successfully and the channel is no longer in the channels container.
> 6) ast_do_pickup() then calls ast_channel_masquerade() to schedule the masquerade on the dead channel.
> 7) ast_do_pickup() then calls ast_do_masquerade() on the dead channel
> 8) bad things happen while doing the masquerade and in the process ast_do_masquerade() puts the dead channel back into the channels container
> 9) The "orphaned" channel is visible in the channels list if a crash does not happen.
> 
> This patch does the following:
> 1) ast_hangup() sets the zombie flag on a successfully hung-up channel and does not release the channel lock until that has happened.
> 2) __ast_channel_masquerade() is fixed to not setup a masquerade if either channel is a zombie.
> 
> 
> This addresses bugs ASTERISK-18222 and ASTERISK-18273.
>     https://issues.asterisk.org/jira/browse/ASTERISK-18222
>     https://issues.asterisk.org/jira/browse/ASTERISK-18273
> 
> 
> Diffs
> -----
> 
>   /branches/1.8/main/channel.c 333893 
> 
> Diff: https://reviewboard.asterisk.org/r/1400/diff
> 
> 
> Testing
> -------
> 
> Replicated the crash in ASTERISK-18222 with PickupChan() and *8 methods.
> 
> With the patch, the crash no longer happens and the new message of attempting to masquerade into a dead channel is output.
> 
> 
> Thanks,
> 
> rmudgett
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20110830/5f69bac4/attachment.htm>


More information about the asterisk-dev mailing list