[asterisk-users] Dahdi interface flapping

Tue Jul 30 10:40:48 CDT 2013

On Tue, Jul 30, 2013 at 11:13:55AM -0400, Andre Goree wrote:
> On Tue, Jul 30, 2013 at 10:56 AM, Shaun Ruffell <sruffell at digium.com> wrote:
> > On Tue, Jul 30, 2013 at 10:36:58AM -0400, Andre Goree wrote:
> >
> > > I've posted the configs and the output of a 'pri debug' below.  Please
> > > let me know if I should include anything else to help troubleshoot.
> > > I've tried both a standalone conifguration as well as the Dahdi module
> > > in FreePBX, results with the same error(s).
> > >
> > > /etc/dahdi/system.conf:
> > > span=1,0,0,ESF,B8ZS
> > > bchan=1-23
> > > dchan=24
> > > loadzone=us
> >
> > I think the span line above is wrong. I think you want:
> >
> > span=1,1,0,esf,b8zs
> >
> > The second 1 indicates that the span should recover the clock from
> > the remote side (which should be your provider). However, normally
> > when you have the timing misconfigured like this you'll get HDLC
> > aborts, and not just the PRI going up and down.
> >
> > So before looking into any more or contacting customer support, it
> > might be easy to change that one line and see if the behavior is
> > different.
> 
> Thanks for the suggestions.  I did have the following before, with
> similar errors:
> 
> [root at asterisk-master dahdi]# cat system.conf.ag
> loadzone = us
> defaultzone=us
> span=1,1,0,esf,b8zs
> bchan=1-23
> dchan=24
> 
> From everything I've read, what you say makes complete sense and in
> fact I'm surprised that's changed (FreePBX's DAHDI module created the
> current config -- i.e. the one I posted in my original email).  I'll
> change that back and see if that makes a difference, but I'm pretty
> sure I used the above configuration in a previous attempt with the
> same results.
> 
> Also, I have indeed received the following messages prior to the
> interface going down and these errors are what I was initially
> researching:
> 
> [root at asterisk-master dahdi]# grep HDLC /var/log/asterisk/full
> ...
> [2013-07-30 08:09:03] NOTICE[3621] chan_dahdi.c: PRI got event: HDLC Bad FCS (8) on D-channel of span 1
> [2013-07-30 08:09:03] NOTICE[3621] chan_dahdi.c: PRI got event: HDLC Bad FCS (8) on D-channel of span 1
> [2013-07-30 08:24:05] NOTICE[3621] chan_dahdi.c: PRI got event: HDLC Bad FCS (8) on D-channel of span 1
> [2013-07-30 08:24:05] NOTICE[3621] chan_dahdi.c: PRI got event: HDLC Bad FCS (8) on D-channel of span 1
> [2013-07-30 08:29:47] NOTICE[3621] chan_dahdi.c: PRI got event: HDLC Bad FCS (8) on D-channel of span 1
> [2013-07-30 08:29:47] NOTICE[3621] chan_dahdi.c: PRI got event: HDLC Bad FCS (8) on D-channel of span 1
> [2013-07-30 08:30:52] NOTICE[3621] chan_dahdi.c: PRI got event: HDLC Abort (6) on D-channel of span 1
> [2013-07-30 08:30:52] NOTICE[3621] chan_dahdi.c: PRI got event: HDLC Abort (6) on D-channel of span 1
> [2013-07-30 08:36:01] NOTICE[3621] chan_dahdi.c: PRI got event: HDLC Bad FCS (8) on D-channel of span 1
> [2013-07-30 08:36:01] NOTICE[3621] chan_dahdi.c: PRI got event: HDLC Bad FCS (8) on D-channel of span 1
> [2013-07-30 08:39:54] NOTICE[3621] chan_dahdi.c: PRI got event: HDLC Abort (6) on D-channel of span 1
> [2013-07-30 08:39:54] NOTICE[3621] chan_dahdi.c: PRI got event: HDLC Abort (6) on D-channel of span 1
> 
> A lot of info I found while researching the above error mentioned
> IRQ's, etc. and was one reason I posted the output of
> /proc/interrupts, heh...but I'm pretty sure my issue is not one
> that has to do with the IRQs.

Ok, that makes more sense. One more thing that is quick and easy to
rule out in case there are interrupt handling issues is the check
and see if there are any error messages in dmesg related to the
driver.

  $ dmesg | grep wcte13xp

If something on the system is preventing the wcte13xp's interrupt
handler from running in a timely manner you'll see messages like:
"Underrun detected by hardware.  Latency bumped to: <x>ms"

Typically I've seen this with systems that are configured to use
framebuffers, disks that are operating in combined mode, systems
with consoles on slow serial ports, or ill-behaved system management
interrupts.

A few of those messages about latency bumps are not a problem, and
it is the drivers way of accommodating systems with less than ideal
real time performance. However if you see any messages like "Tried
to increase latency past buffer size" then the driver will not be
able to accommodate the host system without some changes (if you're
lucky).

But...still...Digium's tech support I'm sure would be more than
happy to help you troubleshoot.

-- 
Shaun Ruffell
Digium, Inc. | Linux Kernel Developer
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at: www.digium.com & www.asterisk.org