[asterisk-dev] [Code Review] SIP Re-invite Glare and 491 Madness...
Mark Michelson
mmichelson at digium.com
Tue Mar 31 18:26:30 CDT 2009
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviewboard.digium.com/r/213/#review660
-----------------------------------------------------------
Great job, David! Really cool that you got this fixed. As far as your actual changes go, I think this will do a good job to fix the reinvite glare problem. One change you have made is that the scheduled sip_reinvite_retry now directly will transmit a reINVITE instead of just setting the flag saying "I need to send one eventually." Because of this, I think that it would be a good idea to modify the scheduling of this function to be RFC 3261-compliant. The latter portion of section 14.1 explains the rules behind scheduling reINVITEs when a 491 is received. You can determine who the owner of a callid is in chan_sip by checking if the SIP_PAGE2_OUTGOING_CALL flag is set on the sip_pvt. If the flag is set, then this sip_pvt created the callid, otherwise the other end did.
In trunk, btw, you'd actually just check p->outgoing_call since the flag I mentioned above does not exist there.
/trunk/channels/chan_sip.c
<http://reviewboard.digium.com/r/213/#comment1699>
I think this comment doesn't really explain the use of glareinvite all that well. I think a comment focusing more on the "why" of this, especially with regards to why the pendinginvite field isn't enough for this scenario would be helpful.
- Mark
On 2009-03-31 15:44:39, David Vossel wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviewboard.digium.com/r/213/
> -----------------------------------------------------------
>
> (Updated 2009-03-31 15:44:39)
>
>
> Review request for Asterisk Developers and Mark Michelson.
>
>
> Summary
> -------
>
> PART A: 491 never being deleted from scheduler because ACK ignored.
>
> This is a very odd situation that results in a dropped call. I'll try and explain this the best I can.
>
> A call goes through two asterisk servers A and B. Both A and B attempt to bridge the calls by issuing a re-Invite to each other at the exact same time. When this happens both respond with a 491 pending invite... so we have "reinvite glare". Some precautions have already been taken to recover from this situation, and this is where the issue gets really hairy...
>
> here's what happens
>
> A --re-Invite--> B
> A <--re-Invite-- B
> A ---491-------> B
> A <----491------ B
>
> A -----ACK-----> B ACK is ignored because it doesn't match B's pending invite seqno, 491 is never deleted from scheduler
> A <----ACK----- B ack is ignored because it doesn't match A's pending invite seqno, 491 is never deleted from scheduler
>
> When the ACK is received by 'A', it is an ACK in response to the 491 'A' sent, but 'A' has no memory of the ACK's seqno because it doesn't match its pending invite seqno. This is because the 491 is sent in response to a glare invite sent by 'B' while 'A' already had a pending invite sent out (in this case back to 'B'). Since 'A' doesn't know about the ACK's seqno, it is ignored, meaning the 491 is never deleted from the scheduler. Same thing happens for 'B's side. The problem is symmetric...
>
> Now the big problem starts
>
> A ----resends 491---> B
> B <----resends 491--- B
> nothing is processed, no acks are sent in response
> A ----resends 491---> B
> B <----resends 491--- B
> again nothing happenes, no acks are sent in response.
>
> the scheduler keeps resending the 491's for each side because they were never removed. Within a few seconds the call is dropped because the both sides hit max num retries for the 491 packet.
>
>
> Solution: During a pending invite, if we receive another invite, we send an 491 and hold on to that glare invite's seqno in the "glareinvite" variable for that sip_pvt struct. When ACK's are received, we first check to see if it is in response to our pending invite, if not we check to see if it is in response to a glare invite. In this case, it is in response to the glare invite and must be dealt with or the call is dropped.
>
>
> PART B: Re-Invite never sent back out after timer expires.
>
> When the re-invite glare situation occurs, each side sends the 491 and cancels their current pending invite. A timer is set for a short random amount of time, and then the re-Invite is sent back out, hopefully not at the same time. We set the timer, execute a function to set the sip_pvt structs SIP_NEEDREINVITE flag, but never call check_pendings to send the reinvite back out.
>
> Solution: Call check_pendings() after setting SIP_NEEDREINVITE flag, add locking to sip_pvt struct since it is called from scheduler.
>
>
> ... This made my brains hurt.
>
>
> This addresses bug 0012013.
> http://bugs.digium.com/view.php?id=0012013
>
>
> Diffs
> -----
>
> /trunk/channels/chan_sip.c 185434
>
> Diff: http://reviewboard.digium.com/r/213/diff
>
>
> Testing
> -------
>
> Made calls, got the re-invite glare situation to occur, watched successful recovery in wireshark. Call stays up!
>
>
> Thanks,
>
> David
>
>
More information about the asterisk-dev
mailing list