[asterisk-users] wct4xxp Excessive Interrupts Resulting in Unusable System or Card

Sun Jun 1 07:41:40 CDT 2014

Hello all-

I have a Digium TE410P in an HP DL145 G2 dual processor server that generates well over 100,000 interrupts per second (sometimes I’ve counted 160,000+ per second) generally resulting in either the system becoming swamped and unusable or the kernel disabling the IRQ the TE410P is on resulting in the spans on that card being unusable.

I have confirmed that the card is good by placing it in an IBM server running FreePBX Distro and verifying that it generates only 1,000 interrupts per second, and works properly.

This is on a system running 64-bit Ubuntu 14.04 LTS, kernels 3.13.0-27-generic and 3.13.0-27-lowlatency. I have compiled and installed DAHDI from source, both 2.9.1.1 and 2.8.0, and see the same result with the Ubuntu DAHDI package which is based on 2.5.0. I have entered BIOS and disabled all extra devices I can and reset the configuration data.

Most frequently the interrupt is disabled by the kernel - booting with the irqpoll option as suggested by the error message does not always solve the problem and introduces other problems. See dmesg below:

(not prepped yet message repeat *many* times)
[   16.371739] wct4xxp 0000:81:01.0: Not prepped yet!
[   16.371743] wct4xxp 0000:81:01.0: Not prepped yet!
[   16.611991] irq 25: nobody cared (try booting with the "irqpoll" option)
[   16.615221] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GF          O 3.13.0-27-generic #50-Ubuntu
[   16.615224] Hardware name: HP ProLiant DL145 G2/K85NL, BIOS 2.14   10/20/2005
[   16.615227]  ffff880139ea6a9c ffff88013bc03e68 ffffffff817199c4 ffff880139ea6a00
[   16.615231]  ffff88013bc03e90 ffffffff810c19d2 ffff880139ea6a00 0000000000000019
[   16.615235]  0000000000000000 ffff88013bc03ed0 ffffffff810c1e6c 000000008101b763
[   16.615239] Call Trace:
[   16.615241]  <IRQ>  [<ffffffff817199c4>] dump_stack+0x45/0x56
[   16.615253]  [<ffffffff810c19d2>] __report_bad_irq+0x32/0xd0
[   16.615257]  [<ffffffff810c1e6c>] note_interrupt+0x1ac/0x200
[   16.615260]  [<ffffffff810bf749>] handle_irq_event_percpu+0xd9/0x1d0
[   16.615263]  [<ffffffff810bf87d>] handle_irq_event+0x3d/0x60
[   16.615267]  [<ffffffff810c29ea>] handle_fasteoi_irq+0x5a/0x100
[   16.615272]  [<ffffffff81015cde>] handle_irq+0x1e/0x30
[   16.615276]  [<ffffffff8172c6cd>] do_IRQ+0x4d/0xc0
[   16.615281]  [<ffffffff81721e6d>] common_interrupt+0x6d/0x6d
[   16.615283]  <EOI>  [<ffffffff810d63c1>] ? tick_nohz_idle_enter+0x41/0x70
[   16.615289]  [<ffffffff810d63bd>] ? tick_nohz_idle_enter+0x3d/0x70
[   16.615292]  [<ffffffff810beb48>] cpu_startup_entry+0x88/0x290
[   16.615297]  [<ffffffff81707e97>] rest_init+0x77/0x80
[   16.615302]  [<ffffffff81d35f70>] start_kernel+0x438/0x443
[   16.615305]  [<ffffffff81d35941>] ? repair_env_string+0x5c/0x5c
[   16.615308]  [<ffffffff81d35120>] ? early_idt_handlers+0x120/0x120
[   16.615312]  [<ffffffff81d355ee>] x86_64_start_reservations+0x2a/0x2c
[   16.615315]  [<ffffffff81d35733>] x86_64_start_kernel+0x143/0x152
[   16.615317] handlers:
[   16.615987] [<ffffffffa01d3420>] t4_interrupt_gen2 [wct4xxp]
[   16.615987] Disabling IRQ #25
[   17.607238] dahdi_echocan_mg2: Registered echo canceler 'MG2'
[   17.608276] wct4xxp 0000:81:01.0: Span 1 configured for ESF/B8ZS
[   17.608360] wct4xxp 0000:81:01.0: SPAN 1: Primary Sync Source
[   17.708056] wct4xxp 0000:81:01.0: RCLK source set to span 1
[   17.708065] wct4xxp 0000:81:01.0: Recovered timing mode, RCLK set to span 1
[   17.736138] wct4xxp 0000:81:01.0: Span 2 configured for ESF/B8ZS
[   17.808065] wct4xxp 0000:81:01.0: RCLK source set to span 1
[   17.808073] wct4xxp 0000:81:01.0: Recovered timing mode, RCLK set to span 1
[   17.864134] wct4xxp 0000:81:01.0: Span 3 configured for ESF/B8ZS
[   17.908049] wct4xxp 0000:81:01.0: RCLK source set to span 1
[   17.908058] wct4xxp 0000:81:01.0: Recovered timing mode, RCLK set to span 1
[   17.992139] wct4xxp 0000:81:01.0: Span 4 configured for ESF/B8ZS
[   18.008106] wct4xxp 0000:81:01.0: RCLK source set to span 1
[   18.008114] wct4xxp 0000:81:01.0: Recovered timing mode, RCLK set to span 1
[   20.208172] wct4xxp 0000:81:01.0: Setting yellow alarm span 1
[   20.208212] wct4xxp 0000:81:01.0: RCLK source set to span 2
[   20.208216] wct4xxp 0000:81:01.0: System timing mode, RCLK set to span 2
[   20.308149] wct4xxp 0000:81:01.0: Setting yellow alarm span 2
[   20.308180] wct4xxp 0000:81:01.0: RCLK source set to span 3
[   20.308184] wct4xxp 0000:81:01.0: System timing mode, RCLK set to span 3
[   20.408173] wct4xxp 0000:81:01.0: Setting yellow alarm span 3
[   20.408200] wct4xxp 0000:81:01.0: RCLK source set to span 4
[   20.408204] wct4xxp 0000:81:01.0: System timing mode, RCLK set to span 4
[   25.601523] wct4xxp 0000:81:01.0: Span 1 configured for ESF/B8ZS
[   25.601587] wct4xxp 0000:81:01.0: SPAN 1: Primary Sync Source
[   25.601673] wct4xxp 0000:81:01.0: Span 4 configured for ESF/B8ZS
[   25.608209] wct4xxp 0000:81:01.0: RCLK source set to span 4
[   25.608215] wct4xxp 0000:81:01.0: System timing mode, RCLK set to span 4

Checking /proc/interrupts reveals that the card generated 100,000 interrupts without being serviced and the kernel disabled it (and also reveals that the card is apparently on its own IRQ):

maintenance at sip:~$ cat /proc/interrupts
           CPU0       CPU1
  0:         46          0   IO-APIC-edge      timer
  1:         10          0   IO-APIC-edge      i8042
  7:          1          0   IO-APIC-edge
  8:          0          0   IO-APIC-edge      rtc0
  9:          0          0   IO-APIC-fasteoi   acpi
 12:          4          0   IO-APIC-edge      i8042
 14:          0          0   IO-APIC-edge      pata_amd
 15:          0          0   IO-APIC-edge      pata_amd
 16:        304          0   IO-APIC-fasteoi   nouveau
 19:       1221          0   IO-APIC-fasteoi   eth1
 21:       8681          0   IO-APIC-fasteoi   sata_nv
 22:          0          0   IO-APIC-fasteoi   ehci_hcd:usb1
 23:          0          0   IO-APIC-fasteoi   ohci_hcd:usb2
 25:     100000          1   IO-APIC-fasteoi   wct4xxp
NMI:          1          1   Non-maskable interrupts
LOC:      17884      19728   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          1          1   Performance monitoring interrupts
IWI:       1554        815   IRQ work interrupts
RTR:          0          0   APIC ICR read retries
RES:       6566       8577   Rescheduling interrupts
CAL:        220       4521   Function call interrupts
TLB:        638        504   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:          1          1   Machine check polls
ERR:          1
MIS:          0

Any ideas on how I can further diagnose and pursue this? Google does not reveal much related to this issue that is useful.

Thank you!

--
Scott L. Lykens
Keystone Medical Management Solutions, Inc.
+1 814 325-7500 x501 -- www.kmmsinc.com<http://www.kmmsinc.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-users/attachments/20140601/cb1867f5/attachment.html>