[asterisk-users] VERY HIGH LOAD AVERAGE: top - 10:27:57 up 199 days, 5:18, 2 users, load average: 67.75, 62.55, 55.75

Tue Feb 9 08:23:55 CST 2010

On Tue, Feb 09, 2010 at 07:42:48AM +0300, Muro, Sam wrote:
> Hi Team
> 
> Can someone advice me on how i can lower the load average on my asterisk
> server?
> 
> dahdi-linux-2.1.0.4
> dahdi-tools-2.1.0.2
> libpri-1.4.10.1
> asterisk-1.4.25.1
> 
> 2 X TE412P Digium cards on ISDN PRI
> 
> Im using the system as an IVR without any transcoding or bridging
> 
> **************************************
> top - 10:27:57 up 199 days,  5:18,  2 users,  load average: 67.75, 62.55,
> 55.75
> Tasks: 149 total,   1 running, 148 sleeping,   0 stopped,   0 zombie Cpu0 
> : 10.3%us, 32.0%sy,  0.0%ni, 57.3%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
> Cpu1  : 10.6%us, 34.6%sy,  0.0%ni, 54.8%id,  0.0%wa,  0.0%hi,  0.0%si, 
> 0.0%st
> Cpu2  : 13.3%us, 36.5%sy,  0.0%ni, 49.8%id,  0.0%wa,  0.0%hi,  0.3%si, 
> 0.0%st
> Cpu3  :  8.6%us, 39.5%sy,  0.0%ni, 51.8%id,  0.0%wa,  0.0%hi,  0.0%si, 
> 0.0%st
> Cpu4  :  7.3%us, 38.0%sy,  0.0%ni, 54.7%id,  0.0%wa,  0.0%hi,  0.0%si, 
> 0.0%st
> Cpu5  : 17.9%us, 37.5%sy,  0.0%ni, 44.5%id,  0.0%wa,  0.0%hi,  0.0%si, 
> 0.0%st
> Cpu6  : 13.3%us, 37.2%sy,  0.0%ni, 49.5%id,  0.0%wa,  0.0%hi,  0.0%si, 
> 0.0%st
> Cpu7  : 12.7%us, 37.3%sy,  0.0%ni, 50.0%id,  0.0%wa,  0.0%hi,  0.0%si, 
> 0.0%st

System is fairly loaded, but there's still plenty of idle CPU cycles. If
we were in a storm of CPU-intensive processes, we would have expected
many more "running" processes. Right now we have none (the single
process is 'top' itself).

> Mem:   3961100k total,  3837920k used,   123180k free,   108944k buffers
> Swap:   779144k total,       56k used,   779088k free,  3602540k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 683
> root      15   0 97968  36m 5616 S 307.7  0.9  41457:34 asterisk
> 17176 root      15   0  2196 1052  800 R  0.7  0.0   0:00.32 top
>     1 root      15   0  2064  592  512 S  0.0  0.0   0:13.96 init
>     2 root      RT  -5     0    0    0 S  0.0  0.0   5:27.80 migration/0 3

Processes seem to be sorted by size. You should have pressed 'p' to go
back to sorting by CPU. Now we don't even see the worst offenders.

> root      34  19     0    0    0 S  0.0  0.0   0:00.11 ksoftirqd/0 4
> root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0 5
> root      RT  -5     0    0    0 S  0.0  0.0   1:07.67 migration/1 6
> root      34  19     0    0    0 S  0.0  0.0   0:00.09 ksoftirqd/1 7
> root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/1 8
> root      RT  -5     0    0    0 S  0.0  0.0   1:16.92 migration/2 9
> root      34  19     0    0    0 S  0.0  0.0   0:00.03 ksoftirqd/2
>    10 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/2 11
> root      RT  -5     0    0    0 S  0.0  0.0   1:34.54 migration/3 12
> root      34  19     0    0    0 S  0.0  0.0   0:00.15 ksoftirqd/3 13
> root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/3 14
> root      RT  -5     0    0    0 S  0.0  0.0   0:54.66 migration/4 15
> root      34  19     0    0    0 S  0.0  0.0   0:00.01 ksoftirqd/4 16
> root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/4 17
> root      RT  -5     0    0    0 S  0.0  0.0   1:39.64 migration/5 18
> root      39  19     0    0    0 S  0.0  0.0   0:00.21 ksoftirqd/5 19
> root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/5 20
> root      RT  -5     0    0    0 S  0.0  0.0   1:06.27 migration/6 21
> root      34  19     0    0    0 S  0.0  0.0   0:00.03 ksoftirqd/6 22
> root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/6 23
> root      RT  -5     0    0    0 S  0.0  0.0   1:23.24 migration/7 24
> root      34  19     0    0    0 S  0.0  0.0   0:00.17 ksoftirqd/7 25
> root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/7 26
> root      10  -5     0    0    0 S  0.0  0.0   0:25.70 events/0 27 root
>      10  -5     0    0    0 S  0.0  0.0   0:37.83 events/1 28 root     
> 10  -5     0    0    0 S  0.0  0.0   0:15.67 events/2 29 root      10 
> -5     0    0    0 S  0.0  0.0   0:40.36 events/3 30 root      10  -5  
>   0    0    0 S  0.0  0.0   0:16.45 events/4

Those are all kernel threads rather than real processes.

So I suspect one of two things:

1. You're right after such a storm. The load average will decreases
sharply.

2. There are many processes hung in state 'D' (uninterruptable system
call). If a process is hung in such a system call for long, it normally
means a problem. E.g. disk-access issues which causes all processes
trying to acess a certain file to hang.

-- 
               Tzafrir Cohen
icq#16849755              jabber:tzafrir.cohen at xorcom.com
+972-50-7952406           mailto:tzafrir.cohen at xorcom.com
http://www.xorcom.com  iax:guest at local.xorcom.com/tzafrir