[asterisk-dev] possible zaptel problem with SMP and RAID1

Matt Fredrickson creslin at digium.com
Mon Jul 9 18:15:33 CDT 2007


I have seen problems in my development in connection with RAID, SCSI, and SATA drives.  I suspect that there are some linux kernel drivers that disable interrupts for extended periods of time (bad drivers).  Perhaps it's related to when the disk decides to bulk flush the buffer caches to disk.  If that is indeed the case, that there is a buggy SCSI/RAID/SATA driver that disables interrupts for such a long period of time, there isn't really anything you can do about it, other than using a different piece of hardware.  Or fix the offensive driver.

---
Matthew Fredrickson
Software/Firmware Engineer
Digium, Inc.

----- "François Delawarde" <fdelawarde at wirelessmundi.com> wrote:
> [sorry for previous mail in html]
> 
> Actually, we plan on using an external echo canceller, but testing
> with 
> or without Octasis gave the same results. We didn't try HPEC, and 
> reverting to ECHO_CAN_KB1 gives the same result (yes i have tried lots
> 
> of things... :-)) .
> 
> François.
> 
> 
> Dimitri Prado wrote:
> > Hello,
> >
> > this is probably not related, but we had tons of interrupt problems
> > when we used zaptel compiled with ECHO_CAN_MG2. When we reverted to
> > ECHO_CAN_KB1 everything worked fine again. Both setups had no IDE,
> > framebuffer, shared irqs etc.
> >
> > regards
> > Dimitri
> >
> > On 7/9/07, François Delawarde <fdelawarde at wirelessmundi.com> wrote:
> >   
> >> Hi again,
> >>
> >> Tzafrir Cohen wrote:
> >>     
> >>> Hi
> >>>
> >>> On Mon, Jul 09, 2007 at 12:29:30PM +0200, François Delawarde
> wrote:
> >>>
> >>>       
> >>>> Hello,
> >>>>
> >>>> I thought this mail would be more appropriate in this mailing
> list, if
> >>>> not sorry about it.
> >>>>
> >>>> I've been having interrupt problems since I'm trying to use
> analog
> >>>> zaptel hardware (mainly openvox A400 and OPVXA1200) on two dual
> core
> >>>> machines (AMD64 X2, different motherboards and network cards)
> with
> >>>> software RAID1 in two SATA drives. These problems didn't occur on
> my
> >>>> previous setups without any RAID.
> >>>>
> >>>>         
> >>> What version of Zaptel do you use?
> >>>
> >>> Is it patched in any way?
> >>>
> >>> OPVXA1200 uses its own driver, originally based on wctdm.
> >>>
> >>>       
> >> Zaptel 1.4.3 with 1 line hookstate patch from bug 0008290 (adapted
> from
> >> 1.2.10 to 1.4.3)
> >>
> >> I also tried non-patched Zaptels from 1.2 and 1.4 series.
> >>
> >>
> >>     
> >>>> The problem appears to happen randomly, a few times per minute
> (or
> >>>> sometimes per 5 minutes), zttest utility drops to 60-90%, saying
> that I
> >>>> had too many interrupts (showing lines like "8192 samples in 7212
> sample
> >>>> interval"). Along with that come an audible "bip" and some rare
> times a
> >>>> small cut in conversation, or a small bit of echo during a very
> short
> >>>> time. I'll add that a higher disk load (running dbench) appears
> to
> >>>> increase a little the frequence of those problems (but not
> totally sure).
> >>>>
> >>>> zttool show no missed interrupts with watchdog option enable
> before
> >>>> compilation. No shared interrupts. No IDE drives (related to
> possible
> >>>> DMA problem). No frame buffer, console only server. Tried with
> all
> >>>> PREEMPT kernel options, all HZ options, with and without IRQ
> balance,
> >>>> trying SMP afinity to switch interrupts to another core, all
> without
> >>>> result, except for PREEMPT options that makes zttest constantly
> report
> >>>> 99.975586% instead of 100% when there are no problems.
> >>>>
> >>>> I'm no kernel expert, but since the only pattern I found in all
> tests
> >>>> seemed to be related to RAID, I was wondering if spinlocks
> disabling
> >>>> interrupts like RAID drivers seem to be doing in SMP
> configuration would
> >>>> be the cause of delaying of zaptel interrupts, leading to the
> kind of
> >>>> problems I have. Any idea on that?
> >>>>
> >>>>         
> >>> First off, better preemption should generally help you. You need
> timely
> >>> response (be that at the price of some throughput performance).
> >>>
> >>>       
> >> That's what I originally thought and thus tried those options to
> see if
> >> it could resolve my problem. Right now, running on 2.6.21.6 with
> "Low
> >> latency Desktop", and HZ=1000, without success.
> >>
> >> Any idea?
> >>
> >>     
> >>>> For info, the problem occurred on these combinations of setups:
> >>>> - OS: Debian etch (tested on sarge)
> >>>> - Processors: two different AMD64 X2, one of each is in AM2
> socket.
> >>>> - Partitions: ext3 on RAID1 (tested with ext3 on LVM on RAID1 and
> ext3
> >>>> on Encrypted LVM on RAID1)
> >>>> - Custom kernel 2.6.21.6 with IMQ and Layer 7 (tested with 2.6.18
> and
> >>>> with/without these two patches, also tested with XEN kernel with
> >>>> horrible, but expected results).
> >>>> - zaptel 1.4.3 (tried 1.2 series, and 1.4 since 1.4.1).
> >>>> - a few services: DNS, DHCP, Samba, PHP/MySQL interface,
> astmanproxy
> >>>> (tested without any).
> >>>>
> >>>> Worked well on:
> >>>> - OS: Debian etch
> >>>> - AMD64 Sempron
> >>>> - Kernel 2.6.18 with IMQ and Layer 7
> >>>> - No RAID
> >>>> - zaptel 1.4 series
> >>>> - same services as above
> >>>>
> >>>>         
> >> _______________________________________________
> >> --Bandwidth and Colocation Provided by
> http://www.api-digital.com--
> >>
> >> asterisk-dev mailing list
> >> To UNSUBSCRIBE or update options visit:
> >>    http://lists.digium.com/mailman/listinfo/asterisk-dev
> >>
> >>     
> >
> > _______________________________________________
> > --Bandwidth and Colocation Provided by http://www.api-digital.com--
> >
> > asterisk-dev mailing list
> > To UNSUBSCRIBE or update options visit:
> >    http://lists.digium.com/mailman/listinfo/asterisk-dev
> 
> _______________________________________________
> --Bandwidth and Colocation Provided by http://www.api-digital.com--
> 
> asterisk-dev mailing list
> To UNSUBSCRIBE or update options visit:
>    http://lists.digium.com/mailman/listinfo/asterisk-dev




More information about the asterisk-dev mailing list