[asterisk-dev] Asterisk scalability (was: Improve scheduler performance under high load)

Venefax venefax at gmail.com
Mon Feb 16 12:36:04 CST 2009


I think that my system suffers from gethostname() constipation. I use 1.4.
Does anybody have any idea about how to make gethostname() read o return
without making a network call? I only use SIP and 100% of my calks are to an
IP address. On Saturday my syst mY DBem went all the way to 1001 calls,
several times, and collapsed. My DNS server is located on the Internet,
close but I think it was not working fine that day.
Federico

-----Original Message-----
From: asterisk-dev-bounces at lists.digium.com
[mailto:asterisk-dev-bounces at lists.digium.com] On Behalf Of Steve Murphy
Sent: Monday, February 16, 2009 1:17 PM
To: Asterisk Developers Mailing List
Subject: Re: [asterisk-dev] Asterisk scalability (was: Improve scheduler
performance under high load)

On Mon, 2009-02-16 at 08:35 +0100, Johansson Olle E wrote:

> Now, can anyone start a discussion on the way we handle threads? If we 
> run on a quad-core or a system with dual quad core CPUs, we have 
> capactiy for an enormous quantity of calls, with at least one thread 
> per call. Can a modern Linux/Unix thread scheduler handle 10 000 
> threads efficently?
> 
> Oh, I think I just started that discussion. Looking forward to your 
> feedback!
> /O

Olle--

Wow, it's been over a year since I played with chan_sip to try and increase
the speeds at which it could process incoming calls!

In doing so, I voiced the suspicion that creating a thread with the
pbx_start call (iirc) could be the major bottleneck in getting asterisk to
run quicker. You see, from what I noted, the amount of time to fire up a new
thread was extremely variable, but could last over 1 MILLION clock ticks! 

And to make things really interesting, this time didn't get tracked by the
gperf stuff (or its relatives).

I went as far as ripping out the dynamic thread pool stuff out of chan_iax2,
which I felt would be an ideal solution. But my initial tests showed
problems under high load; it appeared that changes in trunk since my last
tests (long ago) have again damaged the thru-put of chan_sip back to the
limits of about 100 calls/sec of about a year ago. That limitation was
caused by something as simple as a gethostname or something like that. (I'm
not referring to my notes right now!). I had to go to other things at the
time, and I haven't made it back. On my aging single-core 32-bit test
system, 1.2 could handle over 400-500 call set-up/tear-down operations per
second. 1.4 could barely handle 100. We got back some of that speed with a
change Tilghman made at the time, but I wonder now if we've lost it again.
Just plain no time to test!

Russell has tried using a pool of taskprocessors as a threadpool, which is
not a bad approach to a threadpool, but the part that bothers me is that
taskprocessors, once all of them were completely engaged, would just queue
incoming processes, which could get catastrophically bad if the load exceeds
your projected demand.
The iax2 stuff is really nice in that it will fire up extra threads when
demand is higher than expected, and hang onto them for a while just in case
they are needed again. Periods of higher than expected demand would involve
some slow down as they begin, but once the extra threads are allocated,
sustaining that high rate would be easy. As things slow back down, the extra
unused threads would time out and exit. I thought it a much better attack.

My guess was that using a threadpool in pbx_start would speed up all up the
channel drivers as far as call setup/takedown was concerned, but I learned
also that dahdi didn't seem to use pbx_start to get a new thread; I didn't
have enough time to investigate that, either. Oh, well, maybe sometime this
year.

murf


--
Steve Murphy <murf at digium.com>
Digium




More information about the asterisk-dev mailing list