[asterisk-users] DAHDI wct4xxp high system CPU on idle?

Fri Aug 16 05:19:18 CDT 2013

In article <kugfu0$5pj$1 at hp2.softins.co.uk>,
Tony Mountifield <tony at softins.co.uk> wrote:
> I have a system running CentOS 5.9 and DAHDI 2.6.2 with a 2-port E1 card
> using the wct4xxp driver (also using Asterisk 11.5.0, but that isn't
> relevant to the question).
> 
> With DAHDI and Asterisk started, the system appears to run normally, as
> far as I can tell from limited testing.
> 
> I am monitoring User, System and Nice CPU usage using SNMP and MRTG, and I
> have noticed that when I have started up DAHDI, the System CPU jumps up to
> around 12% or so and stays there. It does this even if I don't start
> Asterisk.
> 
> On previous systems I have built over the years, using CentOS4 and Zaptel,
> I don't recall seeing such high CPU usage just from having Zaptel started.
> It would be down near 0% until the system started handling real calls.
> 
> So my first question would be: is this high CPU usage normal with current
> cards and DAHDI? It's curious that 12.5% is 1/8 of 100% and /proc/cpuinfo
> reports 8 CPUs, but I don't know whether that is just coincidence. The CPU
> is a X3450 with four cores and HT enabled.
> 
> Any thoughts would be gratefully received!

I did some more digging, during which I learnt some stuff about the UCD-SNMP
ssCpuRaw* items. I wrote a perl script to poll them all exactly once per
second. The values are tick counts - apparently, each time a kernel tick happens,
the kernel determines the current state (user, nice, system, wait, kernel,
interrupt, softirq, idle), and increments the appropriate tick counter(s).
Kernel, interrupt and softirq are subsets of system.

The UCD-SNMP module returns new counts every 5 seconds. I noticed that the
system and interrupt counters would spend say 30 seconds incrementing and
then 30 seconds hardly changing. My guess is that the 1ms interrupt handling
of the E1 card is beating with the 10ms system tick, and that the kernel
tick is a higher priority interrupt than the E1 card's. So it's possible
that the tick interrupts the DAHDI interrupt. This increments "system" and
"interrupt", registering effectively a whole 10ms in the counters, even if
the DAHDI interrupt really takes much less than this (which it would need
to, being every 1ms!). The slow beat between the 1ms DAHDI and the 10ms tick
means that once they coincide, they will continue to do so for a while,
registering much more apparent time handling the DAHDI interrupts than is
really the case.

So now I understand it, I'm no longer worried about it, although it's a pity
the graph is misleading!

I also discovered that 100% is the capacity of a single CPU, and a n-core
system will register a max of n*100%. Once I allowed the MRTG graphs to exceed
100%, I found ssCpuRawIdle was near 400% on my 4-core system with HT disabled
and near 800% with HT enabled. (I didn't find having HT enabled gave any
problem with call quality).

Cheers
Tony
-- 
Tony Mountifield
Work: tony at softins.co.uk - http://www.softins.co.uk
Play: tony at mountifield.org - http://tony.mountifield.org