[asterisk-users] Could Asterisk be crashing under high context switches?

Fri Dec 18 08:34:09 CST 2009

On Fri, Dec 18, 2009 at 06:53, Jason Martin <jmartin at metrixmatrix.com>wrote:

> Hello!
>
> I have been struggling with Asterisk 1.6 and DAHDI for the past few weeks.
> We are an outgoing call center with 30 internal analog phones hooked up to 2
> Rhino CB24 channel banks. The banks are connected to a Rhino R4T1 card in a
> Dell 2950 server with 8 gigs of RAM. The 2 other ports on the R4T1 go to our
> 2 PRIs.
>
> In this configuration, we have trouble maintaining stability. It may be
> fine for days, but soon the load slowly creeps up on the server from below 1
> all the way up to 6 which is when no one can dial out and asterisk pretty
> much has to be killed to be stopped.
>
> We also have bandwidth.com set up as a SIP provider. If we use
> bandwidth.com, stability is greatly improved.
>
> I installed munin on the phone server yesterday and noticed something
> dramatic, I think! Asterisk became unstable 3 times yesterday. 2 of those
> times, the number of context switches went to almost 80k the first time,
> then over 70k the second.
>
> First question - is this abnormal for around 20 ongoing recorded calls?
>
> I did a little bit of searching and found this:
>
> http://wiki.sangoma.com/files/wanpipe-linux-asterisk-tutorials/How_to_Reduce_Asterisk_System_Loads.pdf
>
> It talks about zaptel/DAHDI chunk size and that directly affects system
> load.
>
> Second question - the document explains how to change the chunk size for
> Sangoma hardware. Is there a general way to do that for DAHDI?
>
> Thanks is advance!
>
> Jason Martin
> Metrix Matrix, Inc.
> 785 Elmgrove Rd, Bldg 1
> Rochester, NY 14624
> Office: 888-865-0065 x202
> Mobile: 585-705-1400
>
>
>
>
> _______________________________________________
> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
>
> asterisk-users mailing list
> To UNSUBSCRIBE or update options visit:
>   http://lists.digium.com/mailman/listinfo/asterisk-users
>

Hi Jason,
Indeed what you are seeing is not typical. I don't have "normal" number
available off-hand, but a system should have no problems whatsoever with 2
or 3 R4T1s. As you can expect, Rhino has thousands and thousands of
customers running with no problem, which makes this instance the exception.
All Rhino cards use the same amount of bus resources (time hold the PCI bus,
data copied, etc) no matter if it's an R1T1 or R4T1, or how many active
calls you have. There is no need to change the CHUNKSIZE as we have chosen
the optimal solution (in our testing) for keeping system load down on
hardware as minimal as a few hundred megahertz. That said, there's no way
you can change the CHUNKSIZE on a Rhino card, it would require a completely
different firmware.

In my experience, I have seen issues similar to this arise from hard disk
activity hogging the bus. Whether it's simultaneous recordings or perhaps a
considerable amount of other reading/writing, what ends up happening is the
CPU is switching between the Rhino card's interrupt and the IDE/SATA
controller interrupt. When one of those interrupts becomes more frequent and
holds the bus for too long, that takes time away from the R4T1 and data has
to be discarded. We last saw these issues with nVidia hardware in the 2.6.9
kernels, but it's possible some derivative is affecting you.
I would suggest investigating other factors that may be affecting system
load when your call load increases. Context switches are simply a symptom
and you still need to find the culprit.

Regards,
Bryce Chidester
Rhino Equipment Corp.
bryce at rhinoequipment.com
Tel: +1 (480) 621-4000, +1 (877) RHINO-T1
FAX: +1 (480) 961-1826
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.digium.com/pipermail/asterisk-users/attachments/20091218/e33755b1/attachment.htm