[Asterisk-Users] ZAP libpri issue crashes PRI?

steve at daviesfam.org steve at daviesfam.org
Mon Feb 21 02:30:50 MST 2005


Hi,

I have a problem that is biting at all my customer sites where they have 
PRIs taking heavy load.

This happens both with the stable code stream and with the current CVS.


What happens is that after some running, Asterisk starts reporting strange 
errors on the PRI, eventually calling the PRI down

Starts with this sort of thing:

Feb 21 09:39:23 DEBUG[18095]: Didn't get a frame from channel: Zap/68-1
Feb 21 09:39:23 DEBUG[18095]: Bridge stops bridging channels Zap/68-1 and Zap/32-1
Feb 21 09:39:23 DEBUG[22374]: Manager received command 'Command'
Feb 21 09:39:25 DEBUG[18103]: Launching 'SetVar'
Feb 21 09:39:25 DEBUG[18104]: Launching 'SetVar'
Feb 21 09:39:33 DEBUG[22374]: Manager received command 'Command'
Feb 21 09:39:43 DEBUG[22374]: Manager received command 'Command'
Feb 21 09:39:51 DEBUG[18105]: Launching 'SetVar'
Feb 21 09:39:53 DEBUG[22374]: Manager received command 'Command'
Feb 21 09:39:56 DEBUG[18052]: Write returned -1 (Resource temporarily unavailable) on channel 31
Feb 21 09:40:03 DEBUG[22374]: Manager received command 'Command'
Feb 21 09:40:06 DEBUG[18052]: Write returned -1 (Resource temporarily unavailable) on channel 31
Feb 21 09:40:06 DEBUG[17864]: Got RTCP report of 8 bytes
Feb 21 09:40:06 DEBUG[17864]: Got RTCP report of 8 bytes
Feb 21 09:40:13 DEBUG[22374]: Manager received command 'Command'
Feb 21 09:40:23 DEBUG[22374]: Manager received command 'Command'
Feb 21 09:40:33 DEBUG[22374]: Manager received command 'Command'
Feb 21 09:40:34 DEBUG[18030]: Write returned -1 (Unknown error 500) on channel 45
Feb 21 09:40:34 DEBUG[18030]: Exception on 56, channel 45
Feb 21 09:40:34 DEBUG[18030]: Got event Alarm(4) on channel 45 (index 0)
Feb 21 09:40:34 DEBUG[18099]: Exception on 44, channel 33
Feb 21 09:40:34 DEBUG[18099]: Got event Alarm(4) on channel 33 (index 0)
Feb 21 09:40:34 DEBUG[18011]: Exception on 55, channel 44
Feb 21 09:40:34 DEBUG[18011]: Got event Alarm(4) on channel 44 (index 0)
Feb 21 09:40:34 DEBUG[18063]: Exception on 57, channel 46
Feb 21 09:40:34 DEBUG[18063]: Got event Alarm(4) on channel 46 (index 0)
...etc...

It complains about a bunch (though not all) channels.

Then, shortly afterwards:

Feb 21 09:40:34 DEBUG[22364]: Monitor doohicky got event Alarm on channel 34
Feb 21 09:40:34 WARNING[22364]: Detected alarm on channel 34: Red Alarm
Feb 21 09:40:34 WARNING[22364]: Unable to disable echo cancellation on channel 34
Feb 21 09:40:34 DEBUG[22364]: Monitor doohicky got event Alarm on channel 35
Feb 21 09:40:34 WARNING[22364]: Detected alarm on channel 35: Red Alarm
Feb 21 09:40:34 WARNING[22364]: Unable to disable echo cancellation on channel 35
Feb 21 09:40:34 DEBUG[22364]: Monitor doohicky got event Alarm on channel 36
Feb 21 09:40:34 WARNING[22364]: Detected alarm on channel 36: Red Alarm
Feb 21 09:40:34 WARNING[22364]: Unable to disable echo cancellation on channel 36
...etc...

And the other PRI:

Feb 21 09:41:01 DEBUG[22364]: Monitor doohicky got event Alarm on channel 1
Feb 21 09:41:01 WARNING[22364]: Detected alarm on channel 1: Red Alarm
Feb 21 09:41:01 WARNING[22364]: Unable to disable echo cancellation on channel 1
Feb 21 09:41:01 DEBUG[22364]: Monitor doohicky got event Alarm on channel 5
Feb 21 09:41:01 WARNING[22364]: Detected alarm on channel 5: Red Alarm
Feb 21 09:41:01 WARNING[22364]: Unable to disable echo cancellation on channel 5
Feb 21 09:41:01 DEBUG[22364]: Monitor doohicky got event Alarm on channel 6
Feb 21 09:41:01 WARNING[22364]: Detected alarm on channel 6: Red Alarm
Feb 21 09:41:01 WARNING[22364]: Unable to disable echo cancellation on channel 6
Feb 21 09:41:01 DEBUG[22364]: Monitor doohicky got event Alarm on channel 7
Feb 21 09:41:01 WARNING[22364]: Detected alarm on channel 7: Red Alarm
Feb 21 09:41:01 WARNING[22364]: Unable to disable echo cancellation on channel 7

Rapidly followed by the PRI coming back up:

Feb 21 09:41:07 DEBUG[22364]: Monitor doohicky got event No more alarm on channel 34
Feb 21 09:41:07 NOTICE[22364]: Alarm cleared on channel 34
Feb 21 09:41:07 DEBUG[22364]: Monitor doohicky got event No more alarm on channel 35
Feb 21 09:41:07 NOTICE[22364]: Alarm cleared on channel 35
..etc..

and:

Feb 21 09:41:35 DEBUG[22364]: Monitor doohicky got event No more alarm on channel 1
Feb 21 09:41:35 NOTICE[22364]: Alarm cleared on channel 1
Feb 21 09:41:35 DEBUG[22364]: Monitor doohicky got event No more alarm on channel 5
Feb 21 09:41:35 NOTICE[22364]: Alarm cleared on channel 5
Feb 21 09:41:35 DEBUG[22364]: Monitor doohicky got event No more alarm on channel 6
Feb 21 09:41:35 NOTICE[22364]: Alarm cleared on channel 6
..etc..

What I find in the one case is that the PRI keeps getting reported RED 
alarm and recovering, ad infinitum.  

In another case, I saw millions of lines logged in this vein:

Feb 21 09:46:28 DEBUG[22364]: Monitor doohicky got event Event -1 on channel 12

across all the channels that alarmed and recovered.

In either case whe whole system is now toast until it is restarted.

At a particularly busy customer, with 4 machines, each machine usually
gets into this state once a day or so.  Because of the application, we can
get away with automatically detecting and restarting the failed machine
without much business impact.

At another site this bites once a week or two; that's a standard PBX setup
where the impact of the failure is obvious and severe, at this stage 
it seems quite likely that they will kick out the Asterisk system.

I'm unable to tell whether the PRI alarm is a symptom or a cause.  I 
think its a symptom of a bug, because once down it doesn't recover, but 
restart Asterisk and it comes up immediately.

Can anyone help shed some light on the problem?

Thanks,
Steve




More information about the asterisk-users mailing list