[Asterisk-Users] PCI Problems
Rich Adamson
radamson at routers.com
Fri May 26 08:15:59 MST 2006
Andrew Kohlsmith wrote:
> On Thursday 25 May 2006 16:11, Sean Cook wrote:
>> What could be the other causes? I have exhausted everything I know how
>> to do. PCI sharing explains it (whether or not it is infact the
>> problem). This card shares the BIOS assigned interrupt with the network
>> card...
>
> Audio problems can come for a variety of reasons. They are caused by (but not
> limited to) things such as
> - IRQ sharing with another device with a shitty driver or poor hardware
> - Poor/inconsistent PCI bus behaviour and timing
> - overloaded CPU or poor kernel parameters which cause timing problems
> - shitty hardware or drivers which can lock out IRQs for a long time
> - buggy drivers for the TDM or ethernet hardware
> - bad PCI tuning with setpci or kernel parameters, latency timers especially
> - other hardware (PCI bus controller, north or south bridge) issues
> - faulty hardware
> - poor cabling (either TDM side or ethernet side)
>
> IRQ sharing is often blamed for audio problems but the fact of the matter is
> that IRQ sharing is *NOT* an issue if the hardware that is sharing the IRQ
> (and the drivers for that hardware) plays nicely and reacts to the IRQ
> quickly. PCI is DESIGNED to share IRQs. The trouble comes when vendors take
> old ISA hardware, port it to PCI and/or don't ensure that they not only share
> IRQs properly but also do not ensure that their drivers check that their
> hardware caused the IRQ and react to IRQs quickly.
>
> There is NOTHING inherently wrong with sharing IRQs. The IRQ handler needs to
> check the hardware to see if it was their hardware that generated the IRQ and
> get the hell out if not. A lot of (poor) drivers do NOT do this. The driver
> either assumes that the IRQ MUST have been generated by the hardware (which
> can cause a host of weird problems), or the check takes so long that it
> causes trouble for the card that DID generate the IRQ.
>
> Digium's hardware is more sensitive to IRQ sharing trouble than other hardware
> for two very simple reasons.
>
> The first is that the TDM cards have no real buffering. If the data is not
> taken from the register it will quickly be overwritten by the next block of
> data. This is analogous to the old 16450 UARTs of yore. They had a receiver
> shift register and a 1-byte receiver buffer. If you didn't get the data out
> of the buffer before the next byte had shifted in, the new byte would be
> transferred to the buffer and you'd get an overrun error. The 16550 replaced
> the 1-byte receive buffer with a 16-byte FIFO (IIRC) -- you could trigger an
> IRQ after the FIFO had filled 'x' bytes, and then service the IRQ, retrieving
> all bytes received in one fell swoop. And if your IRQ service routine got a
> little delayed it was no big deal because there was room for another byte or
> two before you started losing data. This allowed the IRQ volume on busy
> serial applications to be far lower (up to 16x lower) than before, which
> allowed for better system utilization.
>
> Digium's hardware is like the old 16450. There is no FIFO. This was done
> consciously, and is not necessarily a bad design -- TDM is VERY sensitive to
> latencies. The more delay you have, the worse things like echo become.
> Bringing TDM data into the PC is already pretty laggy. Adding more delay
> with FIFOs isn't necessarily a good thing. (I would argue that having a 16
> byte FIFO and triggering the IRQ on the first position would not be a bad
> thing nor would it introduce any latency, but that's me. I'd change a few
> things about Digium's hardware, but there is no arguing at their success.)
>
> So back to the problem at hand: if there is significant delay between the IRQ
> and the IRQ service, you lose data. This leads to chirping/clicking and in
> the case of T1, HDLC/framing errors, dropped links and bouncing D channels
> (for PRI).
>
> The second reason is that Digium's drivers do a LOT of work in the IRQ
> handler. Essentially they are "poor" PCI neighbours. In the past (I have
> not checked this recently) all of the echo cancellation and "heavy lifting"
> was done right inside the IRQ handler, with interrupts disabled. This caused
> their IRQ service time to be lengthy, and until interrupts are enabled again
> you essentially lock out any other driver from servicing its hardware.
> (Basically Digium's drivers do to other drivers what Digium's drivers can't
> stand to have done to it.) Contrast this with Sangoma's drivers, which get
> the data into system RAM, set a flag (softIRQ?) and then get the hell out of
> the IRQ context as quickly as possible. Then whenever the CPU gets time to
> do it, the driver takes the data and processes it OUTSIDE of the IRQ
> context. Whether this is better or worse for performance is under debate,
> but there is absolutely no question that doing it this way makes their
> products better PCI neighbours.
>
> This is a rather lengthy post, and I am sure that others will post
> contradictory or corrective responses, which I welcome. The jist of the
> post, however, is that there are far more things that can cause audio
> problems than simple IRQ sharing. I had a TDM400 (3 FXS) in a P3 system that
> shared its IRQ with the LAN card *AND* the disk controller. This computer
> was also an NFS server for my media PC in the living room. Every single call
> came in over the network card (I have no phone line), and even while watching
> movies (heavy network and disk use), I had absolutely NO issue with the
> TDM400. No chirping, no echo trouble, nothing. That system had a DAMN good
> PCI interface, and all the drivers coexisted peacefully.
>
> By all anecdotal evidence and rules of thumb that system should have had
> TERRIBLE audio problems. I was not only sharing the IRQ between 3 devices,
> but two of the three devices (TDM400 and network card) would ALWAYS be firing
> IRQs at the same time during a call. I did not, however, have any trouble
> whatsoever.
Andrew,
Have you dug into the TDM400 far enough to know whether the common
complaints are associated with a hardware design issue, TigerJet issue,
or driver? (eg, can any of the issues truly be addressed?)
Just curious...
R.
More information about the asterisk-users
mailing list