[Asterisk-Users] PCI Problems

Andrew Kohlsmith akohlsmith-asterisk at benshaw.com
Fri May 26 07:37:17 MST 2006


On Thursday 25 May 2006 16:11, Sean Cook wrote:
> What could be the other causes?  I have exhausted everything I know how
> to do.  PCI sharing explains it (whether or not it is infact the
> problem).  This card shares the BIOS assigned interrupt with the network
> card...

Audio problems can come for a variety of reasons.  They are caused by (but not 
limited to) things such as
- IRQ sharing with another device with a shitty driver or poor hardware
- Poor/inconsistent PCI bus behaviour and timing
- overloaded CPU or poor kernel parameters which cause timing problems
- shitty hardware or drivers which can lock out IRQs for a long time
- buggy drivers for the TDM or ethernet hardware
- bad PCI tuning with setpci or kernel parameters, latency timers especially
- other hardware (PCI bus controller, north or south bridge) issues
- faulty hardware
- poor cabling (either TDM side or ethernet side)

IRQ sharing is often blamed for audio problems but the fact of the matter is 
that IRQ sharing is *NOT* an issue if the hardware that is sharing the IRQ 
(and the drivers for that hardware) plays nicely and reacts to the IRQ 
quickly.  PCI is DESIGNED to share IRQs.  The trouble comes when vendors take 
old ISA hardware, port it to PCI and/or don't ensure that they not only share 
IRQs properly but also do not ensure that their drivers check that their 
hardware caused the IRQ and react to IRQs quickly.

There is NOTHING inherently wrong with sharing IRQs.  The IRQ handler needs to 
check the hardware to see if it was their hardware that generated the IRQ and 
get the hell out if not.  A lot of (poor) drivers do NOT do this.  The driver 
either assumes that the IRQ MUST have been generated by the hardware (which 
can cause a host of weird problems), or the check takes so long that it 
causes trouble for the card that DID generate the IRQ.

Digium's hardware is more sensitive to IRQ sharing trouble than other hardware 
for two very simple reasons.

The first is that the TDM cards have no real buffering.  If the data is not 
taken from the register it will quickly be overwritten by the next block of 
data.  This is analogous to the old 16450 UARTs of yore.  They had a receiver 
shift register and a 1-byte receiver buffer.  If you didn't get the data out 
of the buffer before the next byte had shifted in, the new byte would be 
transferred to the buffer and you'd get an overrun error.  The 16550 replaced 
the 1-byte receive buffer with a 16-byte FIFO (IIRC) -- you could trigger an 
IRQ after the FIFO had filled 'x' bytes, and then service the IRQ, retrieving 
all bytes received in one fell swoop.  And if your IRQ service routine got a 
little delayed it was no big deal because there was room for another byte or 
two before you started losing data.  This allowed the IRQ volume on busy 
serial applications to be far lower (up to 16x lower) than before, which 
allowed for better system utilization.

Digium's hardware is like the old 16450.  There is no FIFO.  This was done 
consciously, and is not necessarily a bad design -- TDM is VERY sensitive to 
latencies.  The more delay you have, the worse things like echo become.  
Bringing TDM data into the PC is already pretty laggy.  Adding more delay 
with FIFOs isn't necessarily a good thing.  (I would argue that having a 16 
byte FIFO and triggering the IRQ on the first position would not be a bad 
thing nor would it introduce any latency, but that's me. I'd change a few 
things about Digium's hardware, but there is no arguing at their success.)

So back to the problem at hand: if there is significant delay between the IRQ 
and the IRQ service, you lose data.  This leads to chirping/clicking and in 
the case of T1, HDLC/framing errors, dropped links and bouncing D channels 
(for PRI).

The second reason is that Digium's drivers do a LOT of work in the IRQ 
handler.  Essentially they are "poor" PCI neighbours.  In the past (I have 
not checked this recently) all of the echo cancellation and "heavy lifting" 
was done right inside the IRQ handler, with interrupts disabled.  This caused 
their IRQ service time to be lengthy, and until interrupts are enabled again 
you essentially lock out any other driver from servicing its hardware.   
(Basically Digium's drivers do to other drivers what Digium's drivers can't 
stand to have done to it.)  Contrast this with Sangoma's drivers, which get 
the data into system RAM, set a flag (softIRQ?) and then get the hell out of 
the IRQ context as quickly as possible.  Then whenever the CPU gets time to 
do it,  the driver takes the data and processes it OUTSIDE of the IRQ 
context.  Whether this is better or worse for performance is under debate, 
but there is absolutely no question that doing it this way makes their 
products better PCI neighbours.

This is a rather lengthy post, and I am sure that others will post 
contradictory or corrective responses, which I welcome.  The jist of the 
post, however, is that there are far more things that can cause audio 
problems than simple IRQ sharing.  I had a TDM400 (3 FXS) in a P3 system that 
shared its IRQ with the LAN card *AND* the disk controller.  This computer 
was also an NFS server for my media PC in the living room.  Every single call 
came in over the network card (I have no phone line), and even while watching 
movies (heavy network and disk use), I had absolutely NO issue with the 
TDM400.  No chirping, no echo trouble, nothing.  That system had a DAMN good 
PCI interface, and all the drivers coexisted peacefully.

By all anecdotal evidence and rules of thumb that system should have had 
TERRIBLE audio problems.  I was not only sharing the IRQ between 3 devices, 
but two of the three devices (TDM400 and network card) would ALWAYS be firing 
IRQs at the same time during a call.  I did not, however, have any trouble 
whatsoever.

-A.



More information about the asterisk-users mailing list