[asterisk-users] VERY HIGH LOAD AVERAGE: top - 10:27:57 up 199 days, 5:18, 2 users, load average: 67.75, 62.55, 55.75

Wed Feb 10 01:12:55 CST 2010

>> Hi Team
>>
>> Can someone advice me on how i can lower the load average on my asterisk
>> server?
>>
>> dahdi-linux-2.1.0.4
>> dahdi-tools-2.1.0.2
>> libpri-1.4.10.1
>> asterisk-1.4.25.1
>>
>> 2 X TE412P Digium cards on ISDN PRI
>>
>> Im using the system as an IVR without any transcoding or bridging
>>
>> **************************************
>> top - 10:27:57 up 199 days,  5:18,  2 users,  load average: 67.75,
>> 62.55,
>> 55.75
>> Tasks: 149 total,   1 running, 148 sleeping,   0 stopped,   0 zombie
>> Cpu0
>> : 10.3%us, 32.0%sy,  0.0%ni, 57.3%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
>> Cpu1  : 10.6%us, 34.6%sy,  0.0%ni, 54.8%id,  0.0%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Cpu2  : 13.3%us, 36.5%sy,  0.0%ni, 49.8%id,  0.0%wa,  0.0%hi,  0.3%si,
>> 0.0%st
>> Cpu3  :  8.6%us, 39.5%sy,  0.0%ni, 51.8%id,  0.0%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Cpu4  :  7.3%us, 38.0%sy,  0.0%ni, 54.7%id,  0.0%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Cpu5  : 17.9%us, 37.5%sy,  0.0%ni, 44.5%id,  0.0%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Cpu6  : 13.3%us, 37.2%sy,  0.0%ni, 49.5%id,  0.0%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Cpu7  : 12.7%us, 37.3%sy,  0.0%ni, 50.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>
> System is fairly loaded, but there's still plenty of idle CPU cycles. If
> we were in a storm of CPU-intensive processes, we would have expected
> many more "running" processes. Right now we have none (the single
> process is 'top' itself).
>
>> Mem:   3961100k total,  3837920k used,   123180k free,   108944k buffers
>> Swap:   779144k total,       56k used,   779088k free,  3602540k cached
>>
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 683
>> root      15   0 97968  36m 5616 S 307.7  0.9  41457:34 asterisk
>> 17176 root      15   0  2196 1052  800 R  0.7  0.0   0:00.32 top
>>     1 root      15   0  2064  592  512 S  0.0  0.0   0:13.96 init
>>     2 root      RT  -5     0    0    0 S  0.0  0.0   5:27.80 migration/0
>> 3
>
> Processes seem to be sorted by size. You should have pressed 'p' to go
> back to sorting by CPU. Now we don't even see the worst offenders.
>
Tried option 'p' but doesnt seems to exist. Centos 5.3 kernel 2.6.18-128

>
>> root      34  19     0    0    0 S  0.0  0.0   0:00.11 ksoftirqd/0 4
>> root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/0 5
>> root      RT  -5     0    0    0 S  0.0  0.0   1:07.67 migration/1 6
>> root      34  19     0    0    0 S  0.0  0.0   0:00.09 ksoftirqd/1 7
>> root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/1 8
>> root      RT  -5     0    0    0 S  0.0  0.0   1:16.92 migration/2 9
>> root      34  19     0    0    0 S  0.0  0.0   0:00.03 ksoftirqd/2
>>    10 root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/2
>> 11
>> root      RT  -5     0    0    0 S  0.0  0.0   1:34.54 migration/3 12
>> root      34  19     0    0    0 S  0.0  0.0   0:00.15 ksoftirqd/3 13
>> root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/3 14
>> root      RT  -5     0    0    0 S  0.0  0.0   0:54.66 migration/4 15
>> root      34  19     0    0    0 S  0.0  0.0   0:00.01 ksoftirqd/4 16
>> root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/4 17
>> root      RT  -5     0    0    0 S  0.0  0.0   1:39.64 migration/5 18
>> root      39  19     0    0    0 S  0.0  0.0   0:00.21 ksoftirqd/5 19
>> root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/5 20
>> root      RT  -5     0    0    0 S  0.0  0.0   1:06.27 migration/6 21
>> root      34  19     0    0    0 S  0.0  0.0   0:00.03 ksoftirqd/6 22
>> root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/6 23
>> root      RT  -5     0    0    0 S  0.0  0.0   1:23.24 migration/7 24
>> root      34  19     0    0    0 S  0.0  0.0   0:00.17 ksoftirqd/7 25
>> root      RT  -5     0    0    0 S  0.0  0.0   0:00.00 watchdog/7 26
>> root      10  -5     0    0    0 S  0.0  0.0   0:25.70 events/0 27 root
>>      10  -5     0    0    0 S  0.0  0.0   0:37.83 events/1 28 root
>> 10  -5     0    0    0 S  0.0  0.0   0:15.67 events/2 29 root      10
>> -5     0    0    0 S  0.0  0.0   0:40.36 events/3 30 root      10  -5
>>   0    0    0 S  0.0  0.0   0:16.45 events/4
>
> Those are all kernel threads rather than real processes.
>
> So I suspect one of two things:
>
> 1. You're right after such a storm. The load average will decreases
> sharply.
What do you mean Trafrir

Its obvious that the effect increases with increase number of active
channels. e.g. @channels=90, load average = 4 but @channels =235, load
average= 60+
>
> 2. There are many processes hung in state 'D' (uninterruptable system
> call). If a process is hung in such a system call for long, it normally
> means a problem. E.g. disk-access issues which causes all processes
> trying to acess a certain file to hang.

I presume this should happen if there is irq sharing between disks and
cards which isnt my case.
>
> --