[Asterisk-Users] Red alarm on T1 PRI but not on zttool
Frank Cofer
fcofer at lifelabs.net
Sun Jun 13 16:11:25 MST 2004
SYNOPSIS
Erratic red alarm T1 PRI on asterisk, but zttool running concurrently
during alarm shows no errors, irq misses, or alarms, on any span.
Using asterisk and quad Digium T405P, configured as follows:
Span 1 connects to ISDN PRI (fractional 8 B channels, D channel 24).
Span 2 connects to T1 Mux and analog stations.
Span 3 connects to ISDN PRI Nortel BCM hybrid key system digital trunk.
Span 4 is not configured.
Host system is Athlon > 1.8GHz, 512MB running Gentoo Linux 2.4.20
vanilla kernel, RAID 1 (software Linux), 2 x IDE 80GB. T1 card shares
no interrupt.
After 12 hours to several days, asterisk detects a red alarm on the
configured channels 1 through 8 of span 1 (ISDN PRI.) Concurrently run
zttool shows no alarms, no irq misses, and no errors on any span.
When the alarm is first detected, it "bounces" several times, then quits
for 6 hours or so, then recurs. Suspecting a timing problem, the
configuration has been repeatedly checked for proper clocking (to CO
span 1) and this is also verified by zttool which shows sync source as
"Card 0, span1" on all configured spans. The original Digium TE410P
Quad card was replaced with a Digium T405P with no improvement. Telco
has replaced both Adtran HDSL2 smartjack and repeater cards. Problem is
independent of traffic and occurs late night, early morning, weekends or
during load. Simulated load of CPU does not produce any failure and CPU
load is otherwise negligible. The utility zttest has been run on the
new T405P card with no errors. Zttool shows red alarm if a span is
disconnected, LB when under loopback test, but otherwise shows no errors
on any span.
Span 1 connects to telco Adtran HDSL2-R. Numerous loop around tests and
30m span pattern tests were performed over several days, and the span
tested clear in both directions through to CSU with no errors.
Span 2 connects to a Zhone T1 mux and shows no alarms or errors, either
from asterisk console or messages (zap channels 25 -48).
Span 3 connects to Nortel BCM PRI (fractional 8 channels, plus D on
24(channels 9-23 are deprovisioned); it likewise shows no errors.
Span 4 is not connected and not configured. It has been swapped with
Span 1 on occasion with no improvement.
Upon occurrence of an red alarm condition on Span 1, all calls are
dropped. Successive call attempts during the red alarm condition
encounter 120IPM congestion tone. However, calls can still be made
between BMC (Span 3) and the T1 (Span 2) when the red alarm is reported
by the asterisk console on Span 1. Specifically, during a red alarm,
only service through Span 1 is affected.
Here is an example from the messages log (grepped for "channel 1: Red"):
Jun 10 04:02:29 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 12 21:02:17 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 12 21:05:12 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 12 21:08:23 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 12 21:47:01 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 12 21:58:15 WARNING[163851]: Detected alarm on channel 1: Red Alarm
... the alarm continues bouncing, then eventually abates...
Jun 13 01:44:57 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 13 01:52:19 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 13 12:15:15 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 13 12:20:32 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 13 12:25:45 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 13 12:40:02 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 13 12:57:49 WARNING[163851]: Detected alarm on channel 1: Red Alarm
.. it then subsides for several hours, e.g., none further as of Sun Jun
13 17:39:39 EDT 200.
Debug output is apparently is suppressed during alarm. (This has been
reported on bug report separately).
Configuration (/etc/zapata.conf)
span=1,1,3,esf,b8zs
span=2,0,0,esf,b8zs
span=3,0,0,esf,b8zs
#span=4,1,3,esf,b8zs
NOTES
All cables, protectors, and the like, have been verified, reterminated
or swapped. However, these types of errors should have been discovered
via a short-term error test.
The server and all connected telephony equipment is powered by two
separate UPS's, and report no downtime during the red alarm events.
Except during artificial load tests, the server's CPU usage has never
risen above 1%.
Error #500 errors occur sporadically, but without any clear relationship
to the red alarms. UDMA was reduced from UDMA 5 to UDMA 3, but this
failed to correct the error #500 problem.
Debug and warning message logs have been retained. Inexplicably, the red
alarms appear to occur with no external stimulus.
QUESTIONS
1. How can the asterisk messages log show a red alarm, yet the zttool
utility (running concurrently and watched during alarm transition) shows
no red alarm?
2. What are the conditions that asterisk uses to declare red alarms? How
does this differ from the zttool utility?
3. Any other ideas?
More information about the asterisk-users
mailing list