[asterisk-users] VERY HIGH LOAD AVERAGE: top - 10:27:57 up 199 days, 5:18, 2 users, load average: 67.75, 62.55, 55.75

Tzafrir Cohen tzafrir.cohen at xorcom.com
Wed Feb 10 03:44:12 CST 2010


On Wed, Feb 10, 2010 at 10:12:55AM +0300, Muro, Sam wrote:
> 
> >> Hi Team
> >>
> >> Can someone advice me on how i can lower the load average on my asterisk
> >> server?
> >>
> >> dahdi-linux-2.1.0.4
> >> dahdi-tools-2.1.0.2
> >> libpri-1.4.10.1
> >> asterisk-1.4.25.1
> >>
> >> 2 X TE412P Digium cards on ISDN PRI
> >>
> >> Im using the system as an IVR without any transcoding or bridging
> >>
> >> **************************************
> >> top - 10:27:57 up 199 days,  5:18,  2 users,  load average: 67.75,
> >> 62.55,
> >> 55.75
> >> Tasks: 149 total,   1 running, 148 sleeping,   0 stopped,   0 zombie
> >> Cpu0
> >> : 10.3%us, 32.0%sy,  0.0%ni, 57.3%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
> >> Cpu1  : 10.6%us, 34.6%sy,  0.0%ni, 54.8%id,  0.0%wa,  0.0%hi,  0.0%si,
> >> 0.0%st
> >> Cpu2  : 13.3%us, 36.5%sy,  0.0%ni, 49.8%id,  0.0%wa,  0.0%hi,  0.3%si,
> >> 0.0%st
> >> Cpu3  :  8.6%us, 39.5%sy,  0.0%ni, 51.8%id,  0.0%wa,  0.0%hi,  0.0%si,
> >> 0.0%st
> >> Cpu4  :  7.3%us, 38.0%sy,  0.0%ni, 54.7%id,  0.0%wa,  0.0%hi,  0.0%si,
> >> 0.0%st
> >> Cpu5  : 17.9%us, 37.5%sy,  0.0%ni, 44.5%id,  0.0%wa,  0.0%hi,  0.0%si,
> >> 0.0%st
> >> Cpu6  : 13.3%us, 37.2%sy,  0.0%ni, 49.5%id,  0.0%wa,  0.0%hi,  0.0%si,
> >> 0.0%st
> >> Cpu7  : 12.7%us, 37.3%sy,  0.0%ni, 50.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> >> 0.0%st
> >
> > System is fairly loaded, but there's still plenty of idle CPU cycles. If
> > we were in a storm of CPU-intensive processes, we would have expected
> > many more "running" processes. Right now we have none (the single
> > process is 'top' itself).
> >
> >> Mem:   3961100k total,  3837920k used,   123180k free,   108944k buffers
> >> Swap:   779144k total,       56k used,   779088k free,  3602540k cached
> >>
> >>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 683
> >> root      15   0 97968  36m 5616 S 307.7  0.9  41457:34 asterisk
> >> 17176 root      15   0  2196 1052  800 R  0.7  0.0   0:00.32 top
> >>     1 root      15   0  2064  592  512 S  0.0  0.0   0:13.96 init
> >>     2 root      RT  -5     0    0    0 S  0.0  0.0   5:27.80 migration/0
> >> 3
> >
> > Processes seem to be sorted by size. You should have pressed 'p' to go
> > back to sorting by CPU. Now we don't even see the worst offenders.
> >
> Tried option 'p' but doesnt seems to exist. Centos 5.3 kernel 2.6.18-128

Sorry: shift-p (and shift-m to sort by memory).

Another handy switch: shift-h to toggle the display of different
threads of the same process separately.

> 
> >
> >> root      34  19     0    0    0 S  0.0  0.0   0:00.11 ksoftirqd/0 4
> >> root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0 5
> >> root      RT  -5     0    0    0 S  0.0  0.0   1:07.67 migration/1 6
> >> root      34  19     0    0    0 S  0.0  0.0   0:00.09 ksoftirqd/1 7
> >> root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/1 8
> >> root      RT  -5     0    0    0 S  0.0  0.0   1:16.92 migration/2 9
> >> root      34  19     0    0    0 S  0.0  0.0   0:00.03 ksoftirqd/2
> >>    10 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/2
> >> 11
> >> root      RT  -5     0    0    0 S  0.0  0.0   1:34.54 migration/3 12
> >> root      34  19     0    0    0 S  0.0  0.0   0:00.15 ksoftirqd/3 13
> >> root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/3 14
> >> root      RT  -5     0    0    0 S  0.0  0.0   0:54.66 migration/4 15
> >> root      34  19     0    0    0 S  0.0  0.0   0:00.01 ksoftirqd/4 16
> >> root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/4 17
> >> root      RT  -5     0    0    0 S  0.0  0.0   1:39.64 migration/5 18
> >> root      39  19     0    0    0 S  0.0  0.0   0:00.21 ksoftirqd/5 19
> >> root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/5 20
> >> root      RT  -5     0    0    0 S  0.0  0.0   1:06.27 migration/6 21
> >> root      34  19     0    0    0 S  0.0  0.0   0:00.03 ksoftirqd/6 22
> >> root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/6 23
> >> root      RT  -5     0    0    0 S  0.0  0.0   1:23.24 migration/7 24
> >> root      34  19     0    0    0 S  0.0  0.0   0:00.17 ksoftirqd/7 25
> >> root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/7 26
> >> root      10  -5     0    0    0 S  0.0  0.0   0:25.70 events/0 27 root
> >>      10  -5     0    0    0 S  0.0  0.0   0:37.83 events/1 28 root
> >> 10  -5     0    0    0 S  0.0  0.0   0:15.67 events/2 29 root      10
> >> -5     0    0    0 S  0.0  0.0   0:40.36 events/3 30 root      10  -5
> >>   0    0    0 S  0.0  0.0   0:16.45 events/4
> >
> > Those are all kernel threads rather than real processes.
> >
> > So I suspect one of two things:
> >
> > 1. You're right after such a storm. The load average will decreases
> > sharply.
> What do you mean Trafrir
> 
> Its obvious that the effect increases with increase number of active
> channels. e.g. @channels=90, load average = 4 but @channels =235, load
> average= 60+

Each Asterisk channel has a separate thread.

The thing that looked odd was that there were no "processes" (actually:
threads. The Linux scheduler schdules threads). 

The load average is the average length of the running queue over a
certain period of time (three numbers: first one is over a period of a
minute, second: 5 minutes, last: 15 minutes, IIRC).

Recall that audio processing is "soft real-time" - if the CPU is not
available to handle the audio frames in time, you get a cranky sound.

When you had 90 active channels, the system managed to handle them all.
The CPUs were mildly loaded, but had enough idle cycles - 4 processes by
default to run on 8 CPUs. So it seems processes did not have to wait too
long for CPU time (though it is still possible that there were temporary
contentions - we're looking at an average over a period of a minute).

Now you went ahead and added many more calls. The CPUs were flooded. New
calls coming in faster than they could be processed. This means you get
delays (and hence bad audio). You also get a load average that keeps
increasin as long as new calls keep coming. If you see a load that is
way higher than the number of CPUs on the system, the system is probably
flooded.


If you stop sending new calls after a while (either you stop, or maybe
because Asterisk crashed) the load may suddenly decrease.

> >
> > 2. There are many processes hung in state 'D' (uninterruptable system
> > call). If a process is hung in such a system call for long, it normally
> > means a problem. E.g. disk-access issues which causes all processes
> > trying to acess a certain file to hang.
> 
> I presume this should happen if there is irq sharing between disks and
> cards which isnt my case.

This is normally easy to test with a simple 'ps aux | grep D' . If you
have many processes in state 'D' it will stand out. From your
description I figure this is not the case.

-- 
               Tzafrir Cohen
icq#16849755              jabber:tzafrir.cohen at xorcom.com
+972-50-7952406           mailto:tzafrir.cohen at xorcom.com
http://www.xorcom.com  iax:guest at local.xorcom.com/tzafrir



More information about the asterisk-users mailing list