[asterisk-dev] Asterisk scalability

Sun Feb 22 18:06:20 CST 2009

On Feb 21, 2009, at 8:35 AM, Gregory Boehnlein wrote:

>> 'bonding' as it's called in linux does work. I did not test if we can
>> handle the double ammount of calls, because the setups I did only
>> handle like 100 to 150 concurrent calls. But I can see on the  
>> interface
> level
>> the both nics in the bonding device have roughly handled the same
>> ammount of rx and tx packets.
>> The two interfaces are connected to a cisco 3500 switch, and I did  
>> not
>> configure the switch so I have no idea what knobs you have to turn
>> there.
>
> Not that this is Asterisk, but I have successfully used 802.3ad LAG  
> with the
> iSCSI Enterprise Target to build multi-gigabit / second iSCSI SANS  
> and have
> been able to saturate both Gig-E links easily. This is a very common
> scenario when one is building a virtualization platform, and has  
> it's own
> set of tuning parameters.
>
> However, the iSCSI Enterprise Target mailing list has a lot of high- 
> level
> developers that offer tips for tuning both the Linux kernel, as well  
> as the
> Ethernet drivers for best performance. As a result of following their
> advice, and sticking w/ known good hardware (Intel Server Nics!)  
> I've been
> able to saturate Gig-E links w/ iSCSI Traffic between a handful of  
> Vmware
> ESX Servers and the SAN device.
>
> One of the particularly interesting things that I found through the  
> process
> of tuning a SAN implementation was that three main things impact  
> performance
> in additive ways.
>
> 1. Disabling Flow Control on the switch and Ethernet NICs which  
> results in a
> minor loss of top-end burst speed, but greatly reduces latency on  
> packets
> moving through the switch. The tradeoff is a higher load on the CPU  
> and
> Ethernet driver as it interrupts more frequently for I/O.
>
> 2. Changing the Linux kernel scheduler to make it a more responsive  
> to I/O
> requests and service things in a lower latency fashion.
>
> 3. Disabling and tuning NIC parameters such as Interrupt Coalescence.

Do you have any specific references or how-to's on these methods that  
you might post to the list for archival purposes?

> One of the other things that comes into play is the actual load- 
> balancing
> implementation that the switch uses. On the Netgears that we use, it  
> doesn't
> start using the second pipe until the first one is saturated.

That's one of my worries.  It seems that the Linux->switch  
transmissions are probably pretty well handled with 802.3ad, but I'm  
uncertain what the vendors support in the other direction as far as  
how they send their packets.  The goal is to share the PPS load evenly  
across the multiple NICs, and possibly even flow-based caching might  
be handy (though other posts seem to indicate that the latency doesn't  
matter for Out-Of-Order packets.)

> From about 2 years worth of work and maintenance on an IET SAN
> implementation, I can offer that it is possible to operate a pair of  
> trunked
> Gig-E ports at FULL speed carrying about 110-115 MB / second of very  
> low
> latency iSCSI traffic.
>
> I am sure that many of the performance tuning techniques that are  
> used for
> iSCSI implementations, as well as high-end Linux routing platforms  
> would be
> applicable to performance tuning for Asterisk.

Probably.

> In my personal experiences, Asterisk 1.2 fell over at around 220  
> concurrent
> calls using SIPP. As a result, I generally limit the number of calls  
> any
> single Asterisk server handles to 200 maximum.
>
> Based on some of the testing that Jeremy was doing in the Code Zone  
> 3 years
> ago at Astricon, 1.4 made some improvements to that, but still  
> topped out at
> about 400 concurrent calls.
>
> I'd love to see 10,000 calls on a single Asterisk server, but wow..  
> that's
> going to require an incredible amount of effort as well as changes  
> to the
> Asterisk code base!

I agree that 10k calls is a big challenge, both with Asterisk as well  
as the device itself.  Work will certainly need to be done.

But 400 calls as a maximum seems a bit low by today's standards.  Have  
you seen the documented tests by the guys at Transnexus?  They're  
getting 1000 G.711 on relatively inexpensive hardware ($1000) without  
any tuning of any kind.  I'm not disputing your results; but what is  
different between the two scenarios that there is such a significant  
delta?  Just age of data (3 years ago on 1.4...)?

http://lists.digium.com/pipermail/asterisk-dev/2008-October/034902.html

JT

---
John Todd                       email:jtodd at digium.com
Digium, Inc. | Asterisk Open Source Community Director
445 Jan Davis Drive NW -  Huntsville AL 35806  -   USA
direct: +1-256-428-6083         http://www.digium.com/