[Asterisk-Users] Red alarm on T1 PRI but not on zttool

Frank Cofer fcofer at lifelabs.net
Sun Jun 13 16:11:25 MST 2004


SYNOPSIS
Erratic red alarm T1 PRI on asterisk, but zttool running concurrently 
during alarm shows no errors, irq misses, or alarms, on any span.

Using asterisk and quad Digium T405P, configured as follows:

Span 1 connects to ISDN PRI (fractional 8 B channels, D channel 24).
Span 2 connects to T1 Mux and analog stations.
Span 3 connects to ISDN PRI Nortel BCM hybrid key system digital trunk.
Span 4 is not configured.

Host system is Athlon > 1.8GHz, 512MB running Gentoo Linux 2.4.20 
vanilla kernel, RAID 1 (software Linux), 2 x IDE 80GB.  T1 card shares 
no interrupt.

After 12 hours to several days, asterisk detects a red alarm on the 
configured channels 1 through 8 of span 1 (ISDN PRI.)  Concurrently run 
zttool shows no alarms, no irq misses, and no errors on any span.  

When the alarm is first detected, it "bounces" several times, then quits 
for 6 hours or so, then recurs.  Suspecting a timing problem, the 
configuration has been repeatedly checked for proper clocking (to CO 
span 1) and this is also verified by zttool which shows sync source as 
"Card 0, span1" on all configured spans.  The original Digium TE410P 
Quad card was replaced with a Digium T405P with no improvement.  Telco 
has replaced both Adtran HDSL2 smartjack and repeater cards.  Problem is 
independent of traffic and occurs late night, early morning, weekends or 
during load.  Simulated load of CPU does not produce any failure and CPU 
load is otherwise negligible.  The utility zttest has been run on the 
new T405P card with no errors.  Zttool shows red alarm if a span is 
disconnected, LB when under loopback test, but otherwise shows no errors 
on any span.

Span 1 connects to telco Adtran HDSL2-R.  Numerous loop around tests and 
30m span pattern tests were performed over several days, and the span 
tested clear in both directions through to CSU with no errors.  

Span 2 connects to a Zhone T1 mux and shows no alarms or errors, either 
from asterisk console or messages (zap channels 25 -48).   

Span 3 connects to Nortel BCM PRI (fractional 8 channels, plus D on 
24(channels 9-23 are deprovisioned); it likewise shows no errors.  

Span 4 is not connected and not configured.  It has been swapped with 
Span 1 on occasion with no improvement.

Upon occurrence of an red alarm condition on Span 1, all calls are 
dropped. Successive call attempts during the red alarm condition 
encounter 120IPM congestion tone. However, calls can still be made 
between BMC (Span 3) and the T1 (Span 2) when the red alarm is reported 
by the asterisk console on Span 1. Specifically, during a red alarm, 
only service through Span 1 is affected.

Here is an example from the messages log (grepped for "channel 1: Red"):

Jun 10 04:02:29 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 12 21:02:17 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 12 21:05:12 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 12 21:08:23 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 12 21:47:01 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 12 21:58:15 WARNING[163851]: Detected alarm on channel 1: Red Alarm

... the alarm continues bouncing, then eventually abates...

Jun 13 01:44:57 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 13 01:52:19 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 13 12:15:15 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 13 12:20:32 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 13 12:25:45 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 13 12:40:02 WARNING[163851]: Detected alarm on channel 1: Red Alarm
Jun 13 12:57:49 WARNING[163851]: Detected alarm on channel 1: Red Alarm

..  it then subsides for several hours, e.g., none further as of Sun Jun 
13 17:39:39 EDT 200.

Debug output is apparently is suppressed during alarm. (This has been 
reported on bug report separately).

Configuration (/etc/zapata.conf)
span=1,1,3,esf,b8zs   
span=2,0,0,esf,b8zs
span=3,0,0,esf,b8zs
#span=4,1,3,esf,b8zs

NOTES
All cables, protectors, and the like, have been verified, reterminated 
or swapped. However, these types of errors should have been discovered 
via a short-term error test.

The server and all connected telephony equipment is powered by two 
separate UPS's, and report no downtime during the red alarm events.

Except during artificial load tests, the server's CPU usage has never 
risen above 1%.

Error #500 errors occur sporadically, but without any clear relationship 
to the red alarms. UDMA was reduced from UDMA 5 to UDMA 3, but this 
failed to correct the error #500 problem.

Debug and warning message logs have been retained. Inexplicably, the red 
alarms appear to occur with no external stimulus.

QUESTIONS
1. How can the asterisk messages log show a red alarm, yet the zttool 
utility (running concurrently and watched during alarm transition) shows 
no red alarm?

2. What are the conditions that asterisk uses to declare red alarms? How 
does this differ from the zttool utility?

3. Any other ideas?






More information about the asterisk-users mailing list