I attempted an upgrade of our production system from Asterisk/Zaptel 1.2 to 1.4 this weekend. Intially everything looked like it was working properly, but some time in the day following the upgrade, the system died to a kernel panic. I wasn't able to catch the entire kernel dump on the console unfortunately.
<br><br>I attempted to isolate the panic, and found that when 'service zaptel stop' was run (specifically, when wct4xxp was unloaded) I get this panic consistently:<br><br><5> Not prepped yet! (<span style="font-weight: bold;">
repeated approx 250 times</span>)<br><5> Freed a Wildcard<br><5> Not prepped yet! (<span style="font-weight: bold;">repeated approx 550 times</span>)<br><5> Stopped TE4XXP, Turned off DMA<br><5> Not prepped yet! (
<span style="font-weight: bold;">repeated approx 11000 times</span>)<br><5> Unable to handle kernel paging request at ffffff0000034010 RIP:<br><5> <ffffffffa0163207>{:wct4xxp:t4_interrupt_gen2+63}<br><5> PML4 4ea063 PGD 1388067 PMD 1389067 PTE 0
<br><5> Oops: 0000 [1] SMP<br><5> CPU 1<br><5> Modules linked in: zttranscode(U) wct4xxp(U) zaptel(U) crc_ccitt netconsole md5 ipv6 dm_mirror dm_mod button battery ac joydev ehci_hcd uhci_hcd hw_random bnx2 ext3 jbd
<br>cciss sd_mod scsi_mod<br><5> Pid: 11053, comm: hotplug Not tainted 2.6.9-42.0.8.ELsmp<br><5> RIP: 0010:[<ffffffffa0163207>] <ffffffffa0163207>{:wct4xxp:t4_interrupt_gen2+63}<br><5> RSP: 0000:00000100013ebdb0 EFLAGS: 00010046
<br><5> RAX: ffffff0000034000 RBX: 0000010073478724 RCX: 0000000000000002<br><5> RDX: 0000010073478680 RSI: 0000000000000002 RDI: 0000010073478724<br><5> RBP: 0000010073478680 R08: 0000000000000008 R09: 0000000000000000
<br><5> R10: 0000000000000000 R11: 0000000000000002 R12: 00000000000000d1<br><5> R13: 00000100013ebec8 R14: 00000100013ebec8 R15: 0000010075e94978<br><5> FS: 0000002a955643e0(0000) GS:ffffffff804e5900(0000) knlGS:0000000000000000
<br><5> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b<br><5> CR2: ffffff0000034010 CR3: 00000000013d8000 CR4: 00000000000006e0<br><5> Process hotplug (pid: 11053, threadinfo 000001007413c000, task 000001007c445030)
<br><5> Stack: 00000000000000d1 000001007413dc98 ffffffff80138552 0000003000000008<br><5> 00000100013ebea8 00000100013ebde8 0000000000000001 00000000000000d1<br><5> 0000000000000012 0000010073478680
<br><5> Call Trace:<IRQ> <ffffffff80138552>{printk+141} <ffffffff80112f4a>{handle_IRQ_event+41}<br><5> <ffffffff801131c4>{do_IRQ+197} <ffffffff80110833>{ret_from_intr+0}<br>
<5> <ffffffff8013c731>{__do_softirq+77} <ffffffff8013c7e5>{do_softirq+49}<br><5> <ffffffff80110bf5>{apic_timer_interrupt+133} <EOI> <ffffffff8011c21a>{flush_tlb_page+44}
<br><5> <ffffffff80169106>{do_wp_page+1127} <ffffffff80123ed3>{do_page_fault+575}<br><5> <ffffffff80169ff2>{handle_mm_fault+1228} <ffffffff80123e9a>{do_page_fault+518}<br>
<5> <ffffffff8011026a>{system_call+126} <ffffffff80132bc6>{schedule_tail+202}<br><5> <ffffffff80110d91>{error_exit+0}<br><5><br><5> Code: 8b 40 10 89 44 24 58 e8 3d 80 1a e0 31 c0 f6 44 24 58 07 0f
<br><5> RIP <ffffffffa0163207>{:wct4xxp:t4_interrupt_gen2+63} RSP <00000100013ebdb0><br><5> CR2: ffffff0000034010<br><5> <0>Kernel panic - not syncing: Oops<br><5> Badness in panic at kernel/panic.c:118
<br><5><br><5> Call Trace:<IRQ> <ffffffff80137a8a>{panic+527} <ffffffff80110bf5>{apic_timer_interrupt+133}<br><5> <ffffffff80111aec>{oops_end+38} <ffffffff80111b07>{oops_end+65}
<br><5> <ffffffff80124148>{do_page_fault+1204} <ffffffffa0078f51>{:bnx2:bnx2_start_xmit+470}<br><5> <ffffffff802bb4cd>{netpoll_send_skb+257} <ffffffff80110d91>{error_exit+0}
<br><5> <ffffffffa0163207>{:wct4xxp:t4_interrupt_gen2+63} <ffffffff80138552>{printk+141}<br><5> <ffffffff80112f4a>{handle_IRQ_event+41} <ffffffff801131c4>{do_IRQ+197}<br><5> <ffffffff80110833>{ret_from_intr+0} <ffffffff8013c731>{__do_softirq+77}
<br><5> <ffffffff8013c7e5>{do_softirq+49} <ffffffff80110bf5>{apic_timer_interrupt+133}<br><5> <EOI> <ffffffff8011c21a>{flush_tlb_page+44} <ffffffff80169106>{do_wp_page+1127}
<br><5> <ffffffff80123ed3>{do_page_fault+575} <ffffffff80169ff2>{handle_mm_fault+1228}<br><5> <ffffffff80123e9a>{do_page_fault+518} <ffffffff8011026a>{system_call+126}<br>
<5> <ffffffff80132bc6>{schedule_tail+202} <ffffffff80110d91>{error_exit+0}<br><5><br><5> Badness in i8042_panic_blink at drivers/input/serio/i8042.c:987<br><5><br><5> Call Trace:<IRQ> <ffffffff8024219b>{i8042_panic_blink+238} <ffffffff80137a38>{panic+445}
<br><5> <ffffffff80110bf5>{apic_timer_interrupt+133} <ffffffff80111aec>{oops_end+38}<br><5> <ffffffff80111b07>{oops_end+65} <ffffffff80124148>{do_page_fault+1204}<br><5> <ffffffffa0078f51>{:bnx2:bnx2_start_xmit+470} <ffffffff802bb4cd>{netpoll_send_skb+257}
<br><5> <ffffffff80110d91>{error_exit+0} <ffffffffa0163207>{:wct4xxp:t4_interrupt_gen2+63}<br><5> <ffffffff80138552>{printk+141} <ffffffff80112f4a>{handle_IRQ_event+41}<br>
<5> <ffffffff801131c4>{do_IRQ+197} <ffffffff80110833>{ret_from_intr+0}<br><5> <ffffffff8013c731>{__do_softirq+77} <ffffffff8013c7e5>{do_softirq+49}<br><5> <ffffffff80110bf5>{apic_timer_interrupt+133} <EOI> <ffffffff8011c21a>{flush_tlb_page+44}
<br><5> <ffffffff80169106>{do_wp_page+1127} <ffffffff80123ed3>{do_page_fault+575}<br><5> <ffffffff80169ff2>{handle_mm_fault+1228} <ffffffff80123e9a>{do_page_fault+518}<br>
<5> <ffffffff8011026a>{system_call+126} <ffffffff80132bc6>{schedule_tail+202}<br><5> <ffffffff80110d91>{error_exit+0}<br><5> Badness in i8042_panic_blink at drivers/input/serio/i8042.c:990
<br><5><br><5> Call Trace:<IRQ> <ffffffff8024222d>{i8042_panic_blink+384} <ffffffff80137a38>{panic+445}<br><5> <ffffffff80110bf5>{apic_timer_interrupt+133} <ffffffff80111aec>{oops_end+38}
<br><5> <ffffffff80111b07>{oops_end+65} <ffffffff80124148>{do_page_fault+1204}<br><5> <ffffffffa0078f51>{:bnx2:bnx2_start_xmit+470} <ffffffff802bb4cd>{netpoll_send_skb+257}
<br><5> <ffffffff80110d91>{error_exit+0} <ffffffffa0163207>{:wct4xxp:t4_interrupt_gen2+63}<br><5> <ffffffff80138552>{printk+141} <ffffffff80112f4a>{handle_IRQ_event+41}<br>
<5> <ffffffff801131c4>{do_IRQ+197} <ffffffff80110833>{ret_from_intr+0}<br><5> <ffffffff8013c731>{__do_softirq+77} <ffffffff8013c7e5>{do_softirq+49}<br><5> <ffffffff80110bf5>{apic_timer_interrupt+133} <EOI> <ffffffff8011c21a>{flush_tlb_page+44}
<br><5> <ffffffff80169106>{do_wp_page+1127} <ffffffff80123ed3>{do_page_fault+575}<br><5> <ffffffff80169ff2>{handle_mm_fault+1228} <ffffffff80123e9a>{do_page_fault+518}<br>
<5> <ffffffff8011026a>{system_call+126} <ffffffff80132bc6>{schedule_tail+202}<br><5> <ffffffff80110d91>{error_exit+0}<br><5> Badness in i8042_panic_blink at drivers/input/serio/i8042.c:992
<br><5><br><5> Call Trace:<IRQ> <ffffffff80242292>{i8042_panic_blink+485} <ffffffff80137a38>{panic+445}<br><5> <ffffffff80110bf5>{apic_timer_interrupt+133} <ffffffff80111aec>{oops_end+38}
<br><5> <ffffffff80111b07>{oops_end+65} <ffffffff80124148>{do_page_fault+1204}<br><5> <ffffffffa0078f51>{:bnx2:bnx2_start_xmit+470} <ffffffff802bb4cd>{netpoll_send_skb+257}
<br><5> <ffffffff80110d91>{error_exit+0} <ffffffffa0163207>{:wct4xxp:t4_interrupt_gen2+63}<br><5> <ffffffff80138552>{printk+141} <ffffffff80112f4a>{handle_IRQ_event+41}<br>
<5> <ffffffff801131c4>{do_IRQ+197} <ffffffff80110833>{ret_from_intr+0}<br><5> <ffffffff8013c731>{__do_softirq+77} <ffffffff8013c7e5>{do_softirq+49}<br><5> <ffffffff80110bf5>{apic_timer_interrupt+133} <EOI> <ffffffff8011c21a>{flush_tlb_page+44}
<br><5> <ffffffff80169106>{do_wp_page+1127} <ffffffff80123ed3>{do_page_fault+575}<br><5> <ffffffff80169ff2>{handle_mm_fault+1228} <ffffffff80123e9a>{do_page_fault+518}<br>
<5> <ffffffff8011026a>{system_call+126} <ffffffff80132bc6>{schedule_tail+202}<br><5> <ffffffff80110d91>{error_exit+0}<br><br>System details:<br><br>HP DL380G5<br>CentOS x86_64 4.4
<br>uname: 2.6.9-42.0.8.ELsmp #1 SMP Tue Jan 30 12:18:01 EST 2007 x86_64 x86_64 x86_64 GNU/Linux<br>T412P quad-span card<br>TDM400 card with two FXS modules and two FXO modules<br><br>Original software:<br>Libpri 1.2.4 (RPM from atrpms)
<br>Zaptel 1.2.16 (RPM from atrpms)<br>Asterisk 1.2.17 (compiled from source)<br><br>Upgraded software:<br>Libpri 1.4.0 (compiled from source)<br>Zaptel <a href="http://1.4.2.1">1.4.2.1</a> (compiled from source)<br>Asterisk
1.4.4 (compiled from source)<br><br>Before installing the 1.4 files, I removed the 1.2 RPMs using 'yum remove', so there should not have been any cruft left over. I did notice that the atrpms tarballs put some files in different locations. The RPM for libpri puts the files in /usr/lib64, while my source tarball put them in /usr/lib. 'file' indicates that they are both 'ELF 64-bit LSB shared object, AMD x86-64, version 1 (SYSV)' though, so I don't think that this is an issue of mixed 32 and 64 bit objects.
<br><br>I tried rebuilding all of the software from cleanly extracted tarballs to no avail, so I had to restore my backed up 1.2 configuration. Things still aren't working properly: when I attempt to unload the wct4xxp module, the "Not prepped yet!" message floods the console and the 'rmmod' command becomes hung (unkillable), but the system does not panic. I still have to hard reboot to get the system in a state where I can bring up * again though.
<br><br>Since the panic seems to involve the T412P interrupts, here's the output of /proc/interrupts:<br>
<br>
[root@pbxtel-01 ~]# cat /proc/interrupts<br>
CPU0 CPU1<br>
0: 799008 804849 IO-APIC-edge timer<br>
1: 4 5 IO-APIC-edge i8042<br>
8: 5 1 IO-APIC-edge rtc<br>
9: 0 0 IO-APIC-level acpi<br>
74: 16224 3990 PCI-MSI-X cciss0<br>
90: 154441 0 PCI-MSI eth0<br>
169: 0 0 IO-APIC-level uhci_hcd, ehci_hcd<br>
177: 0 0 IO-APIC-level uhci_hcd<br>
185: 0 0 IO-APIC-level uhci_hcd<br>
193: 0 0 IO-APIC-level uhci_hcd<br>
201: 814161 720175 IO-APIC-level wctdm<br>
209: 720174 814169 IO-APIC-level wct4xxp<br>
233: 34 47 IO-APIC-level uhci_hcd<br>
NMI: 1603725 1603680<br>
LOC: 1603015 1603015<br>
ERR: 0<br>
MIS: 0<br>
[root@pbxtel-01 ~]#<br>
<br>
(this is with the system as it is running now with the backed out to 1.2 configuration)<br><br>Any thoughts?<br><br>Thanks<br><br>-- <br>j.