[Asterisk-Dev] spike / asterisk hang? <- On a 360MHz CPU - no, 440 :)

Matt Hess mhess at livewirenet.com
Fri Jul 15 12:04:59 MST 2005


response inline..

Jim Van Meggelen wrote:

>asterisk-dev-bounces at lists.digium.com wrote:
>  
>
>>We have a steadily growing dial plan for our users on a sparc
>>netra t105.. I have noticed that audio going through the
>>asterisk (stable) system
>>sometimes becomes choppy (cuts out and restores after a few
>>seconds) when new calls are being handled.. on the system
>>running a vmstat 1 I watch the traps sys counter jumps up
>>really high from the 
>>normal run stats..
>>    
>>
>
>I would say that is to be expected on such a platform. Running Asterisk
>on a system with a 360MHz CPU is going to limit the number of concurrent
>calls you can handle.
>
>Although at first glance this may seem a dev issue, it is not. It is a
>performance issue -- Asterisk performs all DSP work in the CPU, so as
>the load on the system increases, the chance that callers will hear the
>degradation increase. This should probably be submitted to the user's
>list for any further discussion (or taken off line).
>
>  
>
It's a 440 actually ;) ..  1G ram. I surely thought a sparc would handle 
more than it seems to.. but I feel this may very well be a dev issue if 
looked into further.. I'll explain as I go..

>>An example dialplan entry for a user is as simple as:
>>exten => 201,1,Dial(SIP/201,,t)
>>(we've got a few hundred of these but note that we only have
>>around 15
>>calls active at any given time)
>>because of the transfer requirement media goes through the
>>asterisk server.. we are using sip on both side of the
>>asterisk server ulaw codec is
>>preferred at the endpoints.
>>
>>Here's a vmstat at 1 second intervals .. at the time there
>>are 8 active
>>calls:
>> procs   memory        page                    disks
>>traps         cpu
>> r b w      avm         fre   flt  re  pi  po  fr  sr sd0 cd0 
>> int sys   cs us sy id 0 0 0  42288 868840     7   0   0   0  
>>0   0   0   0  367 469   61 0  0 100
>> 0 0 0  42288 868840     7   0   0   0   0   0   0   0  395
>>802   92 1  0 99
>> 0 0 0  42288 868840    12   0   0   0   0   0   0   0  377
>>554   64 0  0 100
>> 0 0 0  42288 868840     7   0   0   0   0   0   0   0  359
>>476   62 0  0 100
>> 0 0 0  42288 868840     7   0   0   0   0   0   0   0  364
>>468   62 0  0 100
>> 0 0 0  42288 868840     7   0   0   0   0   0   0   0  375
>>459   61 0  0 100
>> 0 0 0  42288 868840     7   0   0   0   0   0   0   0  364
>>480   61 0  0 100
>>
>>But on processing a call vmstat's sys traps and cpu cs
>>counters both jump way up.. 
>>
>> procs   memory        page                    disks
>>traps         cpu
>> r b w      avm         fre   flt  re  pi  po  fr  sr sd0 cd0 
>> int sys   cs us sy id 0 0 0  42280 868848     7   0   0   0  
>>0   0   0   0  709 1429  160 3  1 96
>> 0 0 0  42288 868840    11   0   0   0   0   0   0   0  659
>>1395  168 0  0 100
>>
>>This issue has steadily gotten worse with the addition of
>>more dialplan
>>entries like the one above..
>>    
>>
>
>That is to be expected.
>
>  
>
>>Note that as long as all calls are active and setup (nothing
>>happening involving dialplan) audio is perfect.. only when
>>dialplan processing happens does asterisk seem to hang up for
>>a second.. 
>>    
>>
>
>That's because the introduction of a new call to the system requires
>work on the part of the CPU. Once the channel is established there's
>very little for the CPU to do.
>
>  
>
See that's the thing.. the idle indicator on vmstat never shows the cpu 
bottom out.. 96 % idle should not be the culprit for the reason why 
asterisk hangs the audio.. at least in my mind it should not be..  I 
could understand a high influx of interrupts having an impact like this 
but system calls and traps? That sure feels/seems less likely to me to 
be able to literally stop the flow of packets from the system when a 
call comes in or out.. and I've tested this out extensively.. it is only 
asterisk packets that stop.. other processes on the system perform as 
they should.. heck, a ping to a router keeps sending packets when 
asterisk stops sending it's audio..

I strongly believe that there is something more sinister lurking beneath 
this problem than simply cpu restrictions.

I've got an identical system acting as a router..

 1 0 0  49592 166272     7   0   0   0   0   0   0   0 6194    82   26 
54 46  0
 1 0 0  49600 166264    13   0   0   0   0   0   0   0 6307    67   24 
65 35  0
 1 0 0  49600 166264     7   0   0   0   0   0   0   0 6525    58   22 
72 28  0
 1 0 0  49600 166264     7   0   0   0   0   0   0   0 6994    53   22 
58 42  0

So that's 0 idle cpu.. 6k ints a second and it was with me running an 
ntop process on it.. the thing didn't even bat an eyelash at the load.. 
7.4 Mbps in+out was being passed at that time.. yet pings through it 
only had a 0.679 ms std-dev to a popular webserver across several ip 
providers. (I should note again the reason the cpu was at zero idle was 
that I was beating up the system with ntop)

As I understand it ints are the most cpu consuming of all.. and yet the 
ints aren't changing too much on the system.. just traps and cs go sky 
high while just going through the motions of a new call.. So hopefully 
that explains why I am having a little trouble understanding why a 
bigger system is needed.. especially when it seems that only asterisk is 
affected on the system and everything else I try seems just ducky..

But heck.. I suppose I can just ask this question.. why the heck does 
asterisk generate so many traps and system calls when processing a call?

When completely idle my asterisk system sees about an average of 70 sys 
traps a second.. when a call comes in.. the sys traps alone increase 
over a 1000% increase.. and that is to a simple dialplan of just:
exten => 401,1,Dial(SIP/401)

 0 0 0  42128 868912     7   0   0   0   0   0   0   0  221    65   13  
0  0 100
 0 0 0  42144 868896    15   0   0   0   0   0   0   0  308   759   95  
2  3 95
 0 0 0  42152 868888     9   0   0   0   0   0   0   0  253   314   42  
0  0 100

So I guess my question has morphed into something along the lines of:
Why is the dialplan so expensive to parse and has or is any work going 
into making it more system friendly or efficient?

>>Am I going crazy or am I near the mark in troubleshooting
>>this.. and if
>>so (near the mark and not crazy) then how can I help improve
>>the situation?
>>    
>>
>
>Perhaps there are some performance optimizations that could improve
>matters somewhat, but I suspect that the least expensive, most painless
>way to resolve it is to replace the system with something more powerful.
>
>  
>
'least expensive' is to buy something more powerful? *boggle*
;)

>Oh, also, you are ONLY running Asterisk on that server, yes? No
>database, web server, GUI desktop, or such?
>
>  
>
Yes, only asterisk.. it's all alone.

I definitely appreciate continued feedback as I'd love to get this 
nailed down..

>Regards,
>
>Jim.
>
>
>--
>Jim Van Meggelen
>jim at vanmeggelen.ca
>www.oreillynet.com/cs/catalog/view/au/2177
>
>  
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mhess.vcf
Type: text/x-vcard
Size: 279 bytes
Desc: not available
Url : http://lists.digium.com/pipermail/asterisk-dev/attachments/20050715/304057af/mhess.vcf


More information about the asterisk-dev mailing list