[Asterisk-Users] PCI Problems
Andrew Kohlsmith
akohlsmith-asterisk at benshaw.com
Fri May 26 07:37:17 MST 2006
On Thursday 25 May 2006 16:11, Sean Cook wrote:
> What could be the other causes? I have exhausted everything I know how
> to do. PCI sharing explains it (whether or not it is infact the
> problem). This card shares the BIOS assigned interrupt with the network
> card...
Audio problems can come for a variety of reasons. They are caused by (but not
limited to) things such as
- IRQ sharing with another device with a shitty driver or poor hardware
- Poor/inconsistent PCI bus behaviour and timing
- overloaded CPU or poor kernel parameters which cause timing problems
- shitty hardware or drivers which can lock out IRQs for a long time
- buggy drivers for the TDM or ethernet hardware
- bad PCI tuning with setpci or kernel parameters, latency timers especially
- other hardware (PCI bus controller, north or south bridge) issues
- faulty hardware
- poor cabling (either TDM side or ethernet side)
IRQ sharing is often blamed for audio problems but the fact of the matter is
that IRQ sharing is *NOT* an issue if the hardware that is sharing the IRQ
(and the drivers for that hardware) plays nicely and reacts to the IRQ
quickly. PCI is DESIGNED to share IRQs. The trouble comes when vendors take
old ISA hardware, port it to PCI and/or don't ensure that they not only share
IRQs properly but also do not ensure that their drivers check that their
hardware caused the IRQ and react to IRQs quickly.
There is NOTHING inherently wrong with sharing IRQs. The IRQ handler needs to
check the hardware to see if it was their hardware that generated the IRQ and
get the hell out if not. A lot of (poor) drivers do NOT do this. The driver
either assumes that the IRQ MUST have been generated by the hardware (which
can cause a host of weird problems), or the check takes so long that it
causes trouble for the card that DID generate the IRQ.
Digium's hardware is more sensitive to IRQ sharing trouble than other hardware
for two very simple reasons.
The first is that the TDM cards have no real buffering. If the data is not
taken from the register it will quickly be overwritten by the next block of
data. This is analogous to the old 16450 UARTs of yore. They had a receiver
shift register and a 1-byte receiver buffer. If you didn't get the data out
of the buffer before the next byte had shifted in, the new byte would be
transferred to the buffer and you'd get an overrun error. The 16550 replaced
the 1-byte receive buffer with a 16-byte FIFO (IIRC) -- you could trigger an
IRQ after the FIFO had filled 'x' bytes, and then service the IRQ, retrieving
all bytes received in one fell swoop. And if your IRQ service routine got a
little delayed it was no big deal because there was room for another byte or
two before you started losing data. This allowed the IRQ volume on busy
serial applications to be far lower (up to 16x lower) than before, which
allowed for better system utilization.
Digium's hardware is like the old 16450. There is no FIFO. This was done
consciously, and is not necessarily a bad design -- TDM is VERY sensitive to
latencies. The more delay you have, the worse things like echo become.
Bringing TDM data into the PC is already pretty laggy. Adding more delay
with FIFOs isn't necessarily a good thing. (I would argue that having a 16
byte FIFO and triggering the IRQ on the first position would not be a bad
thing nor would it introduce any latency, but that's me. I'd change a few
things about Digium's hardware, but there is no arguing at their success.)
So back to the problem at hand: if there is significant delay between the IRQ
and the IRQ service, you lose data. This leads to chirping/clicking and in
the case of T1, HDLC/framing errors, dropped links and bouncing D channels
(for PRI).
The second reason is that Digium's drivers do a LOT of work in the IRQ
handler. Essentially they are "poor" PCI neighbours. In the past (I have
not checked this recently) all of the echo cancellation and "heavy lifting"
was done right inside the IRQ handler, with interrupts disabled. This caused
their IRQ service time to be lengthy, and until interrupts are enabled again
you essentially lock out any other driver from servicing its hardware.
(Basically Digium's drivers do to other drivers what Digium's drivers can't
stand to have done to it.) Contrast this with Sangoma's drivers, which get
the data into system RAM, set a flag (softIRQ?) and then get the hell out of
the IRQ context as quickly as possible. Then whenever the CPU gets time to
do it, the driver takes the data and processes it OUTSIDE of the IRQ
context. Whether this is better or worse for performance is under debate,
but there is absolutely no question that doing it this way makes their
products better PCI neighbours.
This is a rather lengthy post, and I am sure that others will post
contradictory or corrective responses, which I welcome. The jist of the
post, however, is that there are far more things that can cause audio
problems than simple IRQ sharing. I had a TDM400 (3 FXS) in a P3 system that
shared its IRQ with the LAN card *AND* the disk controller. This computer
was also an NFS server for my media PC in the living room. Every single call
came in over the network card (I have no phone line), and even while watching
movies (heavy network and disk use), I had absolutely NO issue with the
TDM400. No chirping, no echo trouble, nothing. That system had a DAMN good
PCI interface, and all the drivers coexisted peacefully.
By all anecdotal evidence and rules of thumb that system should have had
TERRIBLE audio problems. I was not only sharing the IRQ between 3 devices,
but two of the three devices (TDM400 and network card) would ALWAYS be firing
IRQs at the same time during a call. I did not, however, have any trouble
whatsoever.
-A.
More information about the asterisk-users
mailing list