[asterisk-biz] IAX Channels limit on Asterisk 1.4.17

Wed Jan 16 04:26:11 CST 2008

On Wed, 2008-01-16 at 01:56 -0800, Nitzan Kon wrote:
> --- Trixter aka Bret McDanel <trixter at 0xdecafbad.com> wrote:
> 
> > By using 1 port like this, iax gets a bottleneck situation where it
> will
> > have all Rx packets go into a single thread, regardless of the number
> > of CPUs/cores you may have in your system.  So this is 1 of that, the
> > scheduler is the rest.  
> 
> So if I'm getting this correctly- if you really wanted to take
> advantage of multi-cores/CPUs, you'd have to go the
> multiple-asterisk-instances-on-same-box route anyway?
> 
or not use iax.  SIP uses 1 port for RTP for each connection (2 if you
count RTCP as well, and remember SIP is signalling RTP is media so they
are separate).  Each port can have its own thread (it isnt required but
asterisk does it that way and that is a good thing).  IAX uses the same
single port for reception which means each and every call gets received
by one thread, and only one thread.  Because UDP is connectionless you
cant have a connection and pass it to a thread which will deal with the
reception.  You could receive the packet and toss it to another thread
based on the callid, but that would probably result in more overhead and
you would see a negative performance impact.

The only other way would be to have different ports for reception so
that you could have different threads.  You could transfer the call
(assuming the other side lets you, and because of that I dont think its
a good carrier class solution, you should never have to rely on your
customers for this type of stuff).  NAT=yes generally prohibits some of
that capability.

So for example you bind to 4 ports because you have 4 cores/cpus.  All
traffic comes in on the default and you immediately transfer it to a
different port.  In that way you could balance the reception out across
all the cores, letting the kernel scheduler take advantage of that.

Keep in mind this is only a reception issue, transmission can still be
via multiple cores, since each call would have its own thread, and
transmission would be spread out.

Additionally if you look at how the linux scheduler works, which treats
threads and processes the same for scheduling purposes, if a thread has
a high runtime (say the receiving thread for iax) it will be put into a
"penalty box" because its running much higher than the other threads (Tx
threads run then idle then run then idle for each packet interval,
receiving threads run all the time on a busy box).  This means that its
priority is much lower than the other threads which will affect audio on
all of your calls.

The higher the number of channels, the more threads you have and the
more contention you have on the cpu.  This results in all threads being
able to Tx but the receiver may not run fast enough on a really busy box
and you get out of sync audio (where latency between Tx and Rx starts to
skew).  This skew may not be that much for short lived calls but will
add up on longer duration calls.

The worst case would be if you ever get the call capacity to the point
that it takes longer than <packet interval> for the reader thread to
fully process all the calls, and you start to get a backlog.  I do not
know that will become the case, odds are the box will crumble first.
Actually that may not be accurate, I dont know the timing that it takes
for 1 iteration, but all you need is > 20ms on reception between full
passes, if it takes 40usec (seems high but maybe) for each thread that
gets bumped to a higher priority because of runtime, that would mean 500
channels.  The real limits may be closer to 1000-1500 channels however.
Of course if you have a sufficient high number of cores then you can
avoid that to some degree because it will be able to do more
concurrently thus increasing the "problem number of channels".  

Using RDTSC on a single core system (timing info is *not* carried across
cores, multicore systems will skew the results) or locking the process
to 1 core would let you know exactly how long it takes per iteration so
you can quickly and easily know how many channels per core would cause a
problem.  RDTSC is a asm instruction that very quickly gets timing info
and would be very useful in situations like this where you want to see
how long it takes to do something.  gettimeofday() is a syscall which
has its own issues in trying to reliably get execution times.

These are some of the biggest reasons I personally avoid iax.  I just
dont consider this type of bottlenecking to be 'carrier class'.  It is
an inherent problem of using a single port for UDP (connectionless
really) communications.  

A work around *could* be to assign a different, higher, priority to the
receiving thread than all the transmission threads.  This would probably
work given the way the linux scheduler treats things.  While the
receiver thread would be bumped down to the lowest level in its
priority, it would still be higher than the transmission threads, and
would then not result in the skew until you have totally and completely
maxed out the box.  There would still be a performance bottleneck
inherent in the protocol, but it would not be as obvious to users.

-- 
Trixter http://www.0xdecafbad.com     Bret McDanel
Belfast +44 28 9099 6461        US +1 516 687 5200
http://www.trxtel.com the phone company that pays you!