[asterisk-bugs] [Asterisk 0013749]: IAX2 storm (type 4, subtype 20: AST_CONTROL_SRCUPDATE)

Asterisk Bug Tracker noreply at bugs.digium.com
Thu Feb 12 18:58:26 CST 2009


A NOTE has been added to this issue. 
====================================================================== 
http://bugs.digium.com/view.php?id=13749 
====================================================================== 
Reported By:                adiemus
Assigned To:                dvossel
====================================================================== 
Project:                    Asterisk
Issue ID:                   13749
Category:                   Channels/chan_iax2
Reproducibility:            always
Severity:                   major
Priority:                   normal
Status:                     feedback
Asterisk Version:           1.4.22 
Regression:                 No 
SVN Branch (only for SVN checkouts, not tarball releases): N/A 
SVN Revision (number only!):  
Request Review:              
====================================================================== 
Date Submitted:             2008-10-20 17:46 CDT
Last Modified:              2009-02-12 18:58 CST
====================================================================== 
Summary:                    IAX2 storm (type 4, subtype 20:
AST_CONTROL_SRCUPDATE)
Description: 
This is a rather odd issue to describe.

In simplest terms, I have a server connected via IAX2 to another server. 
The configuration had been working fine through at least 1.4.18.  However,
between then an 1.4.21.2, this issue began occuring for all calls from
server A to server B.  (But oddly, not for calls from B->A)  The issue
persists in 1.4.22.

When a call is placed through server A to server B via IAX2, server A
sends a torrent of retransmitted packets right after the initial call
setup.  This storm eats up more than 20 Mbit and is reproducible for every
call from server A to server B.  If I replicate the configuration of server
B onto server C, the behavior remains the same.  Calls from server A to
server C show the same "packet storm" bug.

I have complete pcap captures of both sides of the connection, if that's
useful.  However, the short overview from server A's perspective is:

A -> B NEW
B -> A AUTHREQ
A -> B AUTHREP
B -> A ACCEPT
A -> B ACK
B -> A ANSWER
A -> B ACK
A -> B CONTROL (subclass = 20, AST_CONTROL_SRCUPDATE)
A -> B CONTROL (subclass = 20, AST_CONTROL_SRCUPDATE) [retrans]
<storm of identical, retransmitted packets>

This is a serious issue for us as the storm is enough to saturate the
internet connections both of LAN A and LAN B.  (Packet rate is into the
tens or hundreds of thousands per second)

I'm happy to provide more information, but I don't know what else would be
useful.

The iax.conf entry for server B on server A looks like:
[serverB]
type=user
auth=md5
secret=redacted
transfer=no
context=from-serverB
jitterbuffer=no

[serverB]
type=peer
auth=md5
secret=redacted
username=serverA
transfer=no
host=dynamic
disallow=all
allow=gsm
jitterbuffer=no

The config for serverA on serverB:
register => serverB:redacted at serverA.fqdn

[serverA]
type=user
auth=md5
secret=redacted
transfer=no
context=from-serverA
jitterbuffer=yes

[serverA]
type=peer
auth=md5
secret=redacted
username=serverB
host=serverA.fqdn
jitterbuffer=yes
trunk=no
trunktimestamps=no
disallow=all
allow=gsm

In the interim, I'll see if I can do a binary search to find in what
version this behavior started.
====================================================================== 

---------------------------------------------------------------------- 
 (0100076) mihai (reporter) - 2009-02-12 18:58
 http://bugs.digium.com/view.php?id=13749#c100076 
---------------------------------------------------------------------- 
I've encountered this problem on our systems and I did some analysis. I
believe I've figured out the mechanism by which the problem is triggered,
even though I am not yet sure how to fix it.  Anyways, here goes:

My setup: Client1 <--> Asterisk1 <--> Asterisk2 <--> Client2; all talking
IAX2

Client1 is attempting to call Client2 via the two Asterisk servers. In
each server, app_dial attempts to bridge the two channels and calls
res_features.c:ast_bridge_call(), which in turn calls
channel.c:ast_channel_bridge() which in turn attempts to create a native
bridge by calling channel->tech->bridge(). This translates to a call to
chan_iax2.c:iax2_bridge(). The native iax2 bridge is supposed to pass media
back and forth until an AST_FRAME_CONTROL frame arrives, at which point it
shuts down the native bridge with an AST_BRIDGE_COMPLETE return value.

Commit 106235
(http://svn.digium.com/view/asterisk?view=revision&revision=106235)
introduces a new control packet AST_CONTROL_SRCUPDATE, that I believe is
intended to signal to an endpoint to reset its media streams. This frame is
sent to both legs of a call in channel.c:ast_channel_bridge(), right before
entering the bridge loop. Normally, this would be inconsequential, since
most IAX2 clients I'm aware of do not know about this type of control
frame, and will cheerfully acknowledge and ignore it.  

However, in our scenario, one leg of the call connects to another Asterisk
server.  Upon receiving an AST_CONTROL_SRCUPDATE, chan_iax2 will promptly
terminate the native bridge. Channel.c:ast_channel_bridge() will end its
bridge loop and return with a value of 0. The control gets back to
res_features.c:ast_bridge_call() and since there is no code path to handle
AST_CONTROL_SRCUPDATE, the frame is simply consumed and the control goes
back to the beginning of the loop.  Lather, rinse, repeat.

This infinite looping between the three functions will generate a constant
stream of AST_CONTROL_SRCUPDATE frames between the two servers and between
the servers and their clients.  We're talking about full IAX2 frames, so if
for some reason some frames get lost are received out of order, VNAKs get
involved.  This, in turn, will add retransmissions to an already thorny
problem and everything increases exponentially until the brown stuff hits
the proverbial fan. 

I'm not sure how to best fix this.  Obviously, the simplest way is to nip
it in the bud by simply dropping incoming AST_CONTROL_SRCUPDATE in
iax2_bridge().  But this has some implications that I am not sure I fully
understand. In any event, I hope this helps fix the issue. 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2009-02-12 18:58 mihai          Note Added: 0100076                          
======================================================================




More information about the asterisk-bugs mailing list