[asterisk-users] zaptel 1.4.10 regression with TE220B on Proliant DL380 G5 ?

Matthew Fredrickson creslin at digium.com
Wed Apr 16 09:26:32 CDT 2008


Ex Vito wrote:
>   Hi list,
> 
>   After a lot of testing + troubleshooting, I guess I'm observing
>   what I am now calling a regression with zaptel 1.4.10 (is it?)
>   As such I call for peer feedback, before either asking Digium
>   install support or filing a bug.
> 
>   Thanks in advance!
> 
> 
>   System: HP Proliant DL380 G5 with 2x PCI-X + 1x PCIe riser card
>   OS: Centos 5
>   Kernel: 2.6.18-53.1.14.el5 (also tested under 2.6.18-53.el5)
>   HW: Digium TE220B, the one with HW echo cancellation
>          (configured as 2x E1 via jumpers)
> 
>   Context: Pre-site installation of system, no E1 conectivity
>                (loopbacks tested)
> 
> 
>   /etc/zaptel.conf:
>   span=1,1,0,ccs,hdb3,crc4
>   bchan=25-39,41-55
>   dchan=40
>   span=2,2,0,ccs,hdb3,crc4
>   bchan=56-70,72-86
>   dchan=71
> 
> 
>   Under zaptel 1.4.10, when ztcfg runs this gets logged in the kernel
>   buffer:
> 
> About to enter spanconfig!
> Done with spanconfig!
> About to enter spanconfig!
> Done with spanconfig!
> About to enter startup!
> TE2XXP: Span 1 configured for CCS/HDB3/CRC4
> timing source auto card 0!
> wct2xxp: Setting yellow alarm on span 1
> timing source auto card 0!
> SPAN 1: Primary Sync Source
> VPM400: Not Present
> VPM450: echo cancellation for 64 channels
> BUG: soft lockup detected on CPU#0!
>  [<c044d448>] softlockup_tick+0x96/0xa4
>  [<c042ddc8>] update_process_times+0x39/0x5c
>  [<c04196f7>] smp_apic_timer_interrupt+0x5b/0x6c
>  [<c04059bf>] apic_timer_interrupt+0x1f/0x24
>  [<f89bc1e7>] init_vpm450m+0x32d/0x34a [wct4xxp]
>  [<f89a3b11>] t4_vpm450_init+0x18ce/0x198c [wct4xxp]
>  [<f89a7ee4>] t4_startup+0x4315/0x43c7 [wct4xxp]
>  [<c042621c>] release_console_sem+0x17e/0x1b8
>  [<c0407406>] do_IRQ+0xa5/0xae
>  [<f8994311>] t4_dacs+0x211/0x24b [wct4xxp]
>  [<f8a01f6a>] zt_ioctl+0x273/0x144f [zaptel]
>  [<c0457600>] mempool_alloc+0x28/0xc9
>  [<c04ddd33>] cfq_resort_rr_list+0x23/0x8b
>  [<c04deb6c>] cfq_add_crq_rb+0xba/0xc3
>  [<c04dec72>] cfq_insert_request+0x42/0x498
>  [<c04d5175>] elv_insert+0x10a/0x1ad
>  [<c04d908b>] __make_request+0x31d/0x366
>  [<c04de8b1>] cfq_dispatch_requests+0x26a/0x46b
>  [<c04dde27>] __cfq_slice_expired+0x8c/0xa5
>  [<c04de8b1>] cfq_dispatch_requests+0x26a/0x46b
>  [<c04d505d>] elv_next_request+0x15c/0x16a
>  [<f88bc101>] start_io+0x77/0xdc [cciss]
>  [<f88bf63e>] do_cciss_request+0x32c/0x337 [cciss]
>  [<f88ccff0>] __split_bio+0x408/0x418 [dm_mod]
>  [<f88cd6a6>] dm_request+0xce/0xd4 [dm_mod]
>  [<c04d6a81>] generic_make_request+0x248/0x258
>  [<c04d8734>] submit_bio+0xbf/0xc5
>  [<c04548e2>] find_get_page+0x18/0x38
>  [<c04719ad>] __find_get_block_slow+0xfb/0x105
>  [<c0471cea>] __find_get_block+0x15c/0x166
>  [<c0471cea>] __find_get_block+0x15c/0x166
>  [<c0471d24>] __getblk+0x30/0x270
>  [<f885a485>] journal_cancel_revoke+0x8a/0x96 [jbd]
>  [<f885a472>] journal_cancel_revoke+0x77/0x96 [jbd]
>  [<f885626f>] __journal_file_buffer+0x10e/0x1e3 [jbd]
>  [<c041f871>] __wake_up+0x2a/0x3d
>  [<f8856679>] journal_stop+0x1b0/0x1ba [jbd]
>  [<c042a209>] current_fs_time+0x4a/0x55
>  [<c048626d>] touch_atime+0x60/0x8f
>  [<c04552ee>] do_generic_mapping_read+0x421/0x468
>  [<c045478b>] file_read_actor+0x0/0xd1
>  [<c04548e2>] find_get_page+0x18/0x38
>  [<c0457319>] filemap_nopage+0x192/0x315
>  [<c046048f>] __handle_mm_fault+0x85e/0x87b
>  [<c047f46b>] do_ioctl+0x47/0x5d
>  [<c047f6cb>] vfs_ioctl+0x24a/0x25c
>  [<c047f725>] sys_ioctl+0x48/0x5f
>  [<c0404eff>] syscall_call+0x7/0xb
>  =======================
> VPM450: hardware DTMF disabled.
> VPM450: Present and operational servicing 2 span(s)
> Completed startup!
> About to enter startup!
> TE2XXP: Span 2 configured for CCS/HDB3/CRC4
> wct2xxp: Setting yellow alarm on span 2
> timing source auto card 0!
> SPAN 2: Secondary Sync Source
> Completed startup!
> 
> 
>   Soft lockup ?! Hmmm... I'm ignorant on this, but it smells fishy !
> 
>   For completeness sake, driver was previously loaded ok:
> 
> Zapata Telephony Interface Registered on major 196
> Zaptel Version: 1.4.10
> Zaptel Echo Canceller: MG2
> ACPI: PCI Interrupt 0000:18:08.0[A] -> GSI 19 (level, low) -> IRQ 98
> Found TE2XXP at base address fdff0000, remapped to f8854000
> TE2XXP version c01a016a, burst ON
> Octasic optimized!
> FALC version: 00000005, Board ID: 00
> Reg 0: 0x375a2400
> Reg 1: 0x375a2000
> Reg 2: 0xffffffff
> Reg 3: 0x00000000
> Reg 4: 0x00003101
> Reg 5: 0x00000000
> Reg 6: 0xc01a016a
> Reg 7: 0x00001300
> Reg 8: 0x00000000
> Reg 9: 0x00ff2031
> Reg 10: 0x0000004a
> TE2XXP: Launching card: 0
> TE2XXP: Setting up global serial parameters
> Found a Wildcard: Wildcard TE220 (4th Gen)
> 
> 
>   After trying lot's of things (disable ILO, disable USBs, try different kernel,
>   different TE220B, etc), I figured that this "soft hangup" does not show
>   under zaptel 1.4.9.2...
> 
>   In all due honesty, I haven't got the faintest idea what kind of impact this
>   could have.
> 
>   Side testing zaptel 1.4.10 on a simpler system, an HP Proliant ML110 (nearly
>   a PC), the error does not show up as well.
> 
> 
>   I checked the zaptel 1.4.10 ChangeLog and there are some changes which
>   I'd suspect:
> 
> 2008-04-01 16:39 +0000 [r4122]  sruffell <sruffell at localhost>:
> 
>     * kernel/wct4xxp/base.c: Work around for host bridges that generate
>       fast back to back transactions which the current version of the
>       quad span cards do not advertise support for.
> 
> 2008-03-14 16:39 +0000 [r3983-3990]  Matthew Fredrickson <creslin at digium.com>
> 
>     * firmware/Makefile, kernel/wctdm24xxp/base.c,
>       kernel/wctdm24xxp/GpakApi.c, kernel/wctdm24xxp/GpakApi.h: Update
>       wctdm24xxp's VPMADT032 firmware to version 1.16
> 
>     * kernel/wct4xxp/base.c: When doing the ISR rewrite, forgot to
>       include the vpmdtmfcheck when doing DTMF polling causing it to
>       check for DTMF events even when it was told not to
> 
>     (+others)
> 
> 
>   I need to have this system running in about a week and a half.
>   What do you guys say ?

The softlockup indicator should be benign.  It gets called when loaded 
the firmware for the part since the firmware image is so large and it 
takes a long time to load.  However, I might have a fix for you.

Can you try my stack reduction branch at:

https://origsvn.digium.com/svn/zaptel/team/mattf/zaptel-1.4-stackcleanup

If that does not work, please contact me directly and I will work with 
you to get a resolution.

-- 
Matthew Fredrickson
Software/Firmware Engineer
Digium, Inc.



More information about the asterisk-users mailing list