[asterisk-users] Problem with new AEX800 card dying because of interrupt problems

Christian Weeks cpw at weeksfamily.ca
Wed Sep 8 15:27:49 CDT 2010


On Wed, 2010-09-08 at 11:06 -0500, Shaun Ruffell wrote:
> On 09/08/2010 10:38 AM, Christian Weeks wrote:
> > So I am asking the list, do you have any advice except perhaps to go
> > back to the broken channel bank? Is it really true that my modern server
> > class machine (quad core xeon) cannot handle the AEX800, whereas my
> > seven year old AMD desktop (previous host to the T1) could handle what
> > seems to have been about 3x the capacity? Isn't this a massive
> > regression?
> 
> Does the AEX800 work fine in your old AMD desktop?  If the wctdm24xxp
> driver is having problems servicing the interrupt in a timely fashion in
> your server I would be surprised if other cards in the same system
> wouldn't also experience high interrupt latencies which would probably
> manifest itself as pops and noise on the channels.
OK. The AEX800 can't go in the old server- it's a PCI express card and
the AMD doesn't have a PCI express slot (it's that old). wrt to your
comment about the latencies on the other channels, there is none that is
noticeable. The other card (the older PCI card) has absolutely no
problems at all- it's getting clear audio. In fact, so is the new card-
there's not a sign of anything wrong with it at all, except it suddenly
stops working with these interrupt errors. Which is why I suspect the
driver (esp. given some of the fixes in the dahdi 2.4 release) rather
than the card or the computer.

> 
> Some server class machines can have problems since they aren't optimized
> for "real-time" performance but are instead optimized for overall
> throughput (typically) and there are timing requirements for telephony.
>  In other words, it doesn't matter if your server can handle a thousand
> channels...if it can't service any one channel within 25ms consistently,
> you're going to have issues with audio.
This is not observed in any way. The other card, on the PCI bus, has no
issues, despite being slower and older.
> 
> I would recommend:
> 
> a) checking the transfer rate to your hard drive ('hdparam -t
> /dev/[sda|hda]').  If it's below 4MB/s that's the likely culprit.
> Sometimes setting the kernel command line parameter to "hda=none" can
> help depending on the kernel version you're using.  I've also seen slow
> transfer rates fixed by changing BIOS settings.

/dev/sdb:
 Timing buffered disk reads:  190 MB in  3.03 seconds =  62.71 MB/sec

Hmm, don't think that's the culprit, somehow. The server has spent two
years before being repurposed as a phone server as a disk server for
mythtv. I'd have noticed disk latency on it a long time ago.

> 
> b) Use cyclictest (https://rt.wiki.kernel.org/index.php/Cyclictest) and
> then stress your system to make sure maximum latencies remain low
> without DAHDI loaded.  System Management Interrupts / Baseboard
> Management Controllers can cause problems here on some servers.

OK. I'm not sure which tests I need to run here.

Here's a run at idle:
:~# cyclictest -t -p 80 -n -l 10000
policy: fifo: loadavg: 0.03 0.02 0.00 1/210 16899          

T: 0 (16896) P:80 I:1000 C:  10000 Min:      8 Act:   16 Avg:   22 Max:
568
T: 1 (16897) P:79 I:1500 C:   6673 Min:      8 Act:   12 Avg:   25 Max:
119
T: 2 (16898) P:78 I:2000 C:   5005 Min:      9 Act:   14 Avg:   24 Max:
150
T: 3 (16899) P:77 I:2500 C:   4004 Min:      8 Act:   13 Avg:   30 Max:
420

And here's one with some cpu load:

:~# cyclictest -t -p 80 -n -l 10000
policy: fifo: loadavg: 0.82 0.35 0.12 3/217 17212          

T: 0 (17209) P:80 I:1000 C:  10000 Min:      8 Act:   14 Avg:   26 Max:
8047
T: 1 (17210) P:79 I:1500 C:   6667 Min:      8 Act:   12 Avg:   15 Max:
820
T: 2 (17211) P:78 I:2000 C:   5001 Min:      7 Act:   17 Avg:   34 Max:
8184
T: 3 (17212) P:77 I:2500 C:   4001 Min:      9 Act:   40 Avg:   27 Max:
8786

Max is higher (obviously) but there's not really any evidence of a
signficant difference in latency between the two runs, and it looks well
below your threshold (I think thats usecs for those numbers, so it's
about 3 orders of magnitude slower).

> If cyclictest is shows you have some maximum latency above 128ms, I
> would recommend trying to fix that first, but if for some reason you
> can't, you could trade some of your system memory for increased
> tolerance to system conditions by editing the DRING_SIZE in
> drivers/dahdi/voicebus.h to 256 or 512 depending on what cyclictest
> reported what your maximum latency is.  Keep in mind this isn't a "fix"
> since you'll still have problems in your audio for any latency above 25ms.

I'm not sure where to go from here. Every diagnostic seems to be telling
the same story- the computer is fine. Is it possible I have a hardware
problem somehow? Maybe there's something wrong with the card?

Thanks
Christian






More information about the asterisk-users mailing list