[asterisk-users] wct4xxp Excessive Interrupts Resulting in Unusable System or Card
Scott L. Lykens
scott.lykens at kmmsinc.com
Sun Jun 1 07:41:40 CDT 2014
Hello all-
I have a Digium TE410P in an HP DL145 G2 dual processor server that generates well over 100,000 interrupts per second (sometimes I’ve counted 160,000+ per second) generally resulting in either the system becoming swamped and unusable or the kernel disabling the IRQ the TE410P is on resulting in the spans on that card being unusable.
I have confirmed that the card is good by placing it in an IBM server running FreePBX Distro and verifying that it generates only 1,000 interrupts per second, and works properly.
This is on a system running 64-bit Ubuntu 14.04 LTS, kernels 3.13.0-27-generic and 3.13.0-27-lowlatency. I have compiled and installed DAHDI from source, both 2.9.1.1 and 2.8.0, and see the same result with the Ubuntu DAHDI package which is based on 2.5.0. I have entered BIOS and disabled all extra devices I can and reset the configuration data.
Most frequently the interrupt is disabled by the kernel - booting with the irqpoll option as suggested by the error message does not always solve the problem and introduces other problems. See dmesg below:
(not prepped yet message repeat *many* times)
[ 16.371739] wct4xxp 0000:81:01.0: Not prepped yet!
[ 16.371743] wct4xxp 0000:81:01.0: Not prepped yet!
[ 16.611991] irq 25: nobody cared (try booting with the "irqpoll" option)
[ 16.615221] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GF O 3.13.0-27-generic #50-Ubuntu
[ 16.615224] Hardware name: HP ProLiant DL145 G2/K85NL, BIOS 2.14 10/20/2005
[ 16.615227] ffff880139ea6a9c ffff88013bc03e68 ffffffff817199c4 ffff880139ea6a00
[ 16.615231] ffff88013bc03e90 ffffffff810c19d2 ffff880139ea6a00 0000000000000019
[ 16.615235] 0000000000000000 ffff88013bc03ed0 ffffffff810c1e6c 000000008101b763
[ 16.615239] Call Trace:
[ 16.615241] <IRQ> [<ffffffff817199c4>] dump_stack+0x45/0x56
[ 16.615253] [<ffffffff810c19d2>] __report_bad_irq+0x32/0xd0
[ 16.615257] [<ffffffff810c1e6c>] note_interrupt+0x1ac/0x200
[ 16.615260] [<ffffffff810bf749>] handle_irq_event_percpu+0xd9/0x1d0
[ 16.615263] [<ffffffff810bf87d>] handle_irq_event+0x3d/0x60
[ 16.615267] [<ffffffff810c29ea>] handle_fasteoi_irq+0x5a/0x100
[ 16.615272] [<ffffffff81015cde>] handle_irq+0x1e/0x30
[ 16.615276] [<ffffffff8172c6cd>] do_IRQ+0x4d/0xc0
[ 16.615281] [<ffffffff81721e6d>] common_interrupt+0x6d/0x6d
[ 16.615283] <EOI> [<ffffffff810d63c1>] ? tick_nohz_idle_enter+0x41/0x70
[ 16.615289] [<ffffffff810d63bd>] ? tick_nohz_idle_enter+0x3d/0x70
[ 16.615292] [<ffffffff810beb48>] cpu_startup_entry+0x88/0x290
[ 16.615297] [<ffffffff81707e97>] rest_init+0x77/0x80
[ 16.615302] [<ffffffff81d35f70>] start_kernel+0x438/0x443
[ 16.615305] [<ffffffff81d35941>] ? repair_env_string+0x5c/0x5c
[ 16.615308] [<ffffffff81d35120>] ? early_idt_handlers+0x120/0x120
[ 16.615312] [<ffffffff81d355ee>] x86_64_start_reservations+0x2a/0x2c
[ 16.615315] [<ffffffff81d35733>] x86_64_start_kernel+0x143/0x152
[ 16.615317] handlers:
[ 16.615987] [<ffffffffa01d3420>] t4_interrupt_gen2 [wct4xxp]
[ 16.615987] Disabling IRQ #25
[ 17.607238] dahdi_echocan_mg2: Registered echo canceler 'MG2'
[ 17.608276] wct4xxp 0000:81:01.0: Span 1 configured for ESF/B8ZS
[ 17.608360] wct4xxp 0000:81:01.0: SPAN 1: Primary Sync Source
[ 17.708056] wct4xxp 0000:81:01.0: RCLK source set to span 1
[ 17.708065] wct4xxp 0000:81:01.0: Recovered timing mode, RCLK set to span 1
[ 17.736138] wct4xxp 0000:81:01.0: Span 2 configured for ESF/B8ZS
[ 17.808065] wct4xxp 0000:81:01.0: RCLK source set to span 1
[ 17.808073] wct4xxp 0000:81:01.0: Recovered timing mode, RCLK set to span 1
[ 17.864134] wct4xxp 0000:81:01.0: Span 3 configured for ESF/B8ZS
[ 17.908049] wct4xxp 0000:81:01.0: RCLK source set to span 1
[ 17.908058] wct4xxp 0000:81:01.0: Recovered timing mode, RCLK set to span 1
[ 17.992139] wct4xxp 0000:81:01.0: Span 4 configured for ESF/B8ZS
[ 18.008106] wct4xxp 0000:81:01.0: RCLK source set to span 1
[ 18.008114] wct4xxp 0000:81:01.0: Recovered timing mode, RCLK set to span 1
[ 20.208172] wct4xxp 0000:81:01.0: Setting yellow alarm span 1
[ 20.208212] wct4xxp 0000:81:01.0: RCLK source set to span 2
[ 20.208216] wct4xxp 0000:81:01.0: System timing mode, RCLK set to span 2
[ 20.308149] wct4xxp 0000:81:01.0: Setting yellow alarm span 2
[ 20.308180] wct4xxp 0000:81:01.0: RCLK source set to span 3
[ 20.308184] wct4xxp 0000:81:01.0: System timing mode, RCLK set to span 3
[ 20.408173] wct4xxp 0000:81:01.0: Setting yellow alarm span 3
[ 20.408200] wct4xxp 0000:81:01.0: RCLK source set to span 4
[ 20.408204] wct4xxp 0000:81:01.0: System timing mode, RCLK set to span 4
[ 25.601523] wct4xxp 0000:81:01.0: Span 1 configured for ESF/B8ZS
[ 25.601587] wct4xxp 0000:81:01.0: SPAN 1: Primary Sync Source
[ 25.601673] wct4xxp 0000:81:01.0: Span 4 configured for ESF/B8ZS
[ 25.608209] wct4xxp 0000:81:01.0: RCLK source set to span 4
[ 25.608215] wct4xxp 0000:81:01.0: System timing mode, RCLK set to span 4
Checking /proc/interrupts reveals that the card generated 100,000 interrupts without being serviced and the kernel disabled it (and also reveals that the card is apparently on its own IRQ):
maintenance at sip:~$ cat /proc/interrupts
CPU0 CPU1
0: 46 0 IO-APIC-edge timer
1: 10 0 IO-APIC-edge i8042
7: 1 0 IO-APIC-edge
8: 0 0 IO-APIC-edge rtc0
9: 0 0 IO-APIC-fasteoi acpi
12: 4 0 IO-APIC-edge i8042
14: 0 0 IO-APIC-edge pata_amd
15: 0 0 IO-APIC-edge pata_amd
16: 304 0 IO-APIC-fasteoi nouveau
19: 1221 0 IO-APIC-fasteoi eth1
21: 8681 0 IO-APIC-fasteoi sata_nv
22: 0 0 IO-APIC-fasteoi ehci_hcd:usb1
23: 0 0 IO-APIC-fasteoi ohci_hcd:usb2
25: 100000 1 IO-APIC-fasteoi wct4xxp
NMI: 1 1 Non-maskable interrupts
LOC: 17884 19728 Local timer interrupts
SPU: 0 0 Spurious interrupts
PMI: 1 1 Performance monitoring interrupts
IWI: 1554 815 IRQ work interrupts
RTR: 0 0 APIC ICR read retries
RES: 6566 8577 Rescheduling interrupts
CAL: 220 4521 Function call interrupts
TLB: 638 504 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 1 1 Machine check polls
ERR: 1
MIS: 0
Any ideas on how I can further diagnose and pursue this? Google does not reveal much related to this issue that is useful.
Thank you!
--
Scott L. Lykens
Keystone Medical Management Solutions, Inc.
+1 814 325-7500 x501 -- www.kmmsinc.com<http://www.kmmsinc.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-users/attachments/20140601/cb1867f5/attachment.html>
More information about the asterisk-users
mailing list