[asterisk-dev] Asterisk scalability

Sun Feb 22 06:55:45 CST 2009

On Sat, 21 Feb 2009 11:35:45 -0500, Gregory Boehnlein wrote
> > 'bonding' as it's called in linux does work. I did not test if we can
> > handle the double ammount of calls, because the setups I did only
> > handle like 100 to 150 concurrent calls. But I can see on the interface
> level
> > the both nics in the bonding device have roughly handled the same
> > ammount of rx and tx packets.
> > The two interfaces are connected to a cisco 3500 switch, and I did not
> > configure the switch so I have no idea what knobs you have to turn
> > there.
> 
> Not that this is Asterisk, but I have successfully used 802.3ad LAG with the
> iSCSI Enterprise Target to build multi-gigabit / second iSCSI SANS and have
> been able to saturate both Gig-E links easily. This is a very common
> scenario when one is building a virtualization platform, and has it's own
> set of tuning parameters.
> 
> However, the iSCSI Enterprise Target mailing list has a lot of high-level
> developers that offer tips for tuning both the Linux kernel, as well as the
> Ethernet drivers for best performance. As a result of following their
> advice, and sticking w/ known good hardware (Intel Server Nics!) I've been
> able to saturate Gig-E links w/ iSCSI Traffic between a handful of Vmware
> ESX Servers and the SAN device.
> 
> One of the particularly interesting things that I found through the process
> of tuning a SAN implementation was that three main things impact performance
> in additive ways.
> 
> 1. Disabling Flow Control on the switch and Ethernet NICs which results in a
> minor loss of top-end burst speed, but greatly reduces latency on packets
> moving through the switch. The tradeoff is a higher load on the CPU and
> Ethernet driver as it interrupts more frequently for I/O.
> 
> 2. Changing the Linux kernel scheduler to make it a more responsive to I/O
> requests and service things in a lower latency fashion.
> 
> 3. Disabling and tuning NIC parameters such as Interrupt Coalescence.
> 
> One of the other things that comes into play is the actual load-balancing
> implementation that the switch uses. On the Netgears that we use, it doesn't
> start using the second pipe until the first one is saturated.
> 
> >From about 2 years worth of work and maintenance on an IET SAN
> implementation, I can offer that it is possible to operate a pair of trunked
> Gig-E ports at FULL speed carrying about 110-115 MB / second of very low
> latency iSCSI traffic.
> 
> I am sure that many of the performance tuning techniques that are used for
> iSCSI implementations, as well as high-end Linux routing platforms would be
> applicable to performance tuning for Asterisk.
> 

Partially agree, but for SANs and routers you need low latency and high
throughput for usually big packets where jumbo frames can also help, while for
Asterisk and VoIP you have thousands of small UDP packets (not TCP so no NIC
offloading is possible) and 1ms delay (with pooling at 1000 IRQ/sec) of the
packet is negligible compared to the normal network jitter and packetization
time, but will let you process several calls at once. And 1ms is the worst
case - for 10000 calls and single NIC, the buffer of the card may not be
enough for 1ms of data and you will get more than 1000 interrupts so flow
control may actually help here to reduce them and the CPU load.

Some personal thoughts:
 Assume there are exactly 1000 interrupts per NIC (perfect timing interface :)
) and you are processing several calls at once, so Asterisk should be able to
handle those calls on the same core in less than 1ms. Add more NICs, one CPU
core and dedicated Asterisk I/O thread per NIC to scale up the number of calls.
 Now put a balancer in front of Asterisk to spread single routing optimized
NIC to those multiple NICs, like 1 NIC without flow control and Interrupt
Coalescence tuned, bridged to few round robin bounded NICs going to Asterisk
directly = equal load on each NIC/Core/IO thread.
 Now there are 2 machines ... how to go back to a single one? Can multiqueue
adapter replace the balancer? Probably yes - one queue per CPU and one IO
thread for each queue on the same core.
 If it is possible to move the IO thread in the kernel (even as just a fifo
extending the NIC's buffer) without going to userspace on each interupt by
using shared memory (large enough - HugePages?) to exchange the data to
Asterisk asynchronously - the added latency will not harm VoIP, but will
reduce context switching and CPU load. Now if the channels are spread equally
(from userspace) amongst threads in 1 second (i.e. 10000 channels = 10 each
1ms = 500 packets for 20ms packetization and similar math for the
registrations) there is a good chance that they will become self clocked from
the other side too, reducing the bursts/congestion on our side.

 So in short i think that the hardware tuning will help a lot, but more will
to somehow process the calls in blocks and to spread them as much as possible
equally in one second and each core - which will also help on low-load single
core servers too.

> In my personal experiences, Asterisk 1.2 fell over at around 220 concurrent
> calls using SIPP. As a result, I generally limit the number of calls any
> single Asterisk server handles to 200 maximum.
> 
> Based on some of the testing that Jeremy was doing in the Code Zone 3 years
> ago at Astricon, 1.4 made some improvements to that, but still topped out at
> about 400 concurrent calls.
> 
> I'd love to see 10,000 calls on a single Asterisk server, but wow.. that's
> going to require an incredible amount of effort as well as changes to the
> Asterisk code base!
> 
> _______________________________________________
> --Bandwidth and Colocation Provided by http://www.api-digital.com--
> 
> asterisk-dev mailing list
> To UNSUBSCRIBE or update options visit:
>    http://lists.digium.com/mailman/listinfo/asterisk-dev