[asterisk-users] T1, FoneBRIDGE, and dropped D-Channel

Gleim, Jason jgleim at atsautomation.com
Wed Feb 4 10:09:04 CST 2009


I hope someone can help me out with this issue. It has been dogging me
for months and I can't seem to get it to go away.

I have a Rhino Ceros box running Asterisk 1.4.21.2 connected via eth0
(nVidia MCP61 Ethernet) to a RedFone FoneBRIDGE2 dual-port with EC. The
FB is the latest hardware rev and the latest firmware. I'm running the
latest fonulator version and I'm running Zap-1.4.11 sourced from
RedFone.

Nothing else is on eth0. It is currently connected thru a dedicated
switch to the FB and the secondary server although I've observed this
problem when connected directly. There are two other eth cards, one for
the internal network and one for the DMZ.

My problem is that every now and then the D-Channel will drop which will
terminate all calls in process. The D-channel will immediately come back
up (usually within a second) but that doesn't do any good because the
calls are gone by then and users are mad. The log entry at one of these
events looks like this:

[Feb  3 08:14:02] ERROR[26063] chan_zap.c: Write to 65 failed: Unknown
error 500
[Feb  3 08:14:02] ERROR[26063] chan_zap.c: Short write: 0/15 (Unknown
error 500)
[Feb  3 08:14:02] WARNING[26063] chan_zap.c: Detected alarm on channel
1: Yellow Alarm
.... (same message for other 22 channels)
[Feb  3 08:14:02] NOTICE[2660] chan_zap.c: PRI got event: Alarm (4) on
Primary D-channel of span 1
[Feb  3 08:14:02] WARNING[2660] chan_zap.c: No D-channels available!
Using Primary channel 24 as D-channel anyway!
[Feb  3 08:14:02] NOTICE[2662] chan_zap.c: Alarm cleared on channel 1
.... (same message for other 22 channels)
[Feb  3 08:14:02] NOTICE[2660] chan_zap.c: PRI got event: No more alarm
(5) on Primary D-channel of span 1
[Feb  3 08:14:02] NOTICE[2660] chan_zap.c: PRI got event: HDLC Abort (6)
on Primary D-channel of span 1
[Feb  3 08:14:02] NOTICE[2660] chan_zap.c: PRI got event: HDLC Bad FCS
(8) on Primary D-channel of span 1

You'll notice the timestamps are all within the same 1-second interval
which makes me think it is literally missing one packet and causing the
drop. I'm sure the configs are fine as they've been reviewed by about 20
people and the system works most of the time.

If the machine has been freshly started up, this happens about once
every other day. The machine has currently been running for over 36 days
and I'm seeing several per day now. AT&T has run a stress test on the
line from the CO to the smartjack and found no problems. The cable from
the smartjack to the FoneBRIDGE is about 18" and I've tried a couple
with no difference.

I'm convinced this is interrupt related. When I initially commissioned
this machine, the FB was connected to eth2 and I couldn't get it to link
up with the CO at all. The D-Channel was flapping like crazy. I switched
it to eth0 and it worked. You can see from my interrupts that the
on-board and the add-in cards are clearly on different busses.

           CPU0
  0: 3266196236    IO-APIC-edge  timer
  1:          2    IO-APIC-edge  i8042
  8:          3    IO-APIC-edge  rtc
  9:          0   IO-APIC-level  acpi
169:  207230361   IO-APIC-level  ohci_hcd:usb1
177:    5080313   IO-APIC-level  sata_nv
185:          0   IO-APIC-level  sata_nv
193:    1632824   IO-APIC-level  eth1
201:   39823124   IO-APIC-level  eth2
225: 2565938694         PCI-MSI  eth0
NMI:          0
LOC: 3266207768
ERR:          1
MIS:          0

So the fact that I couldn't link up when I was on one card and I could
when I am on another (with no config changes... other than re-directing
ztdynamic) leads me directly to this interrupt issue. Can anyone shed
some light here? Has someone seen this before? If so, how did you solve
it?

Thanks!
Jason


------------------------------------------------------------------
This e-mail message, including any attachments, is only for the use of the intended recipient (s). The information contained may be confidential, in which case its disclosure or reproduction is strictly prohibited. If you are not the intended recipient, please return it immediately to its sender at the above address and delete it.




More information about the asterisk-users mailing list