[asterisk-ss7] Strange interrupt issue with zaptel and chan_ss7

Shane Burrell shaneb at metrostat.net
Fri Jul 7 08:07:51 MST 2006


I hate to say it but you might want to consider a sangoma card. Problems
seem to disappear with the a104d.

 

  _____  

From: asterisk-ss7-bounces at lists.digium.com
[mailto:asterisk-ss7-bounces at lists.digium.com] On Behalf Of Christopher
Bergström
Sent: Friday, July 07, 2006 11:03 AM
To: asterisk-ss7 at lists.digium.com; km at westend.com
Subject: Re: [asterisk-ss7] Strange interrupt issue with zaptel and chan_ss7

 

Kai Militzer wrote: 

Hello everyone, 

I have a very strange issue with a TE205P, zaptel and chan_ss7 regarding
interrupts. As I think this is originated somehwere in the zaptel driver, I
crosspost this to -dev, hope that's OK. 

As chan_ss7 need not d-channel, the configuration in /etc/zaptel.conf is as
follows: 

bchan=1-31 
bchan=32-62 

That's the first difference to a config with zaptel, where dchannels need to
be configured. 

I came across the issue, when I started to get a lot of CRC16 errors on the
MTP2 part of chan_ss7 resulting at last in a flapping of the complete SS7
signaling every few minutes. Together with these CRC16 errors I got
messages, that chan_ss7 ran into an "Excessive poll delay" and that the
Zaptel input buffer went full, directly followed by a empty zaptel output
buffer. What was/is strange is the fact, that I have two machines configured
the same, only differing in DPC and OPC codes in chan_ss7 and different
CPUs. The machine working without problems runs with a AMD Duron with 1300
MHz, the one with the CRC-errors on a P4 with 3GHz. 

My first step to find the source of the problem was to put the Card into a
different System, also running a P4, but this time only with 1.8GHz. That
resulted in the same errors, the SS7 part wouldn't even start with that one.


So I started to dig deeper. I made a crosslink cable and connected two E1
ports with it, started two instances of asterisk with chan_ss7 and
experienced the problem (that proved at last, that there was no problem with
the Switch from the TelCo). So my next step was to start the two instances
with chan_zap instead of chan_ss7 and everything is fine. No erros in any
way. 

As I knew, that the Card in my other system with an AMD worked, I now
changed Hardware again, this time putting the card in AMD Athlon XP 3000+,
crosslinked the two E1s again and started two instances of asterisk running
chan_ss7. And voila, no problem. At least at first it looked this way. I let
it run over night (only asterisk is running, no traffic or whatsoever may
distract the system) and when I came back this morning, I had CRC errors +
the other error messages on the screen. So now I suspected the card (which I
still do for a bit). To test, if the TE205P works as it should, I made a
crossover plug (Pin 1-4 and 2-5) and ran a patlooptest, a loop test with
zttool, a zttest and also uncommented #define CONFIG_ZAPTEL_WATCHDOG in
zconfig.h. All looks fine,  no errors whatsoever. The card is assigned an
interrupt for itself without anything else using it. All looked good. 

Then finaly I came across the behavior that puzzles me. Asterisk was running
with two instances over the crosslink and the console screen was blanked on
the console. So I wanted to press Shift to unblank it, but accidently
pressed the CapsLock key. When the screen was unblanked and I  started to
type, I realized, that CapLock was on. I pressed it again and in this
moment, I got a CRC16 error. I thought that was strange and pressed it again
twice, and there it was again, Packets from the zaptel driver to chan_ss7
got lost. The same behavior happens, when I press ScollLock or NumLock. The
Keyboard runs on interrupt 1 and the TE205P on IRQ11, so there shouldn't be
any impact when the keyboard uses this interrupt. 

As you can see, I am stuck now, why does this happen and why only with
chan_ss7? I cannot say if I can reproduce the errors on my "running system"
(the one without the errors) as it is located elswhere without a keyboard
connected. Any ideas how to solve this are greatly appreciated, as I need to
get the system back to work. 

Here is the bottom line problem in chan_ss7 right now..  

snip from man 2 write

ERRORS
       EAGAIN Non-blocking I/O has been selected using O_NONBLOCK and the
write would block.
       EINTR  The call was interrupted by a signal before any data was
written.

It basically comes down to the write errors not being handled properly as
best I can tell.  Make a patch that tests for these errors and handles them
as they should..  When I have more time I'll poke around in how ast_frame is
being passed as the asterisk-devs think there may also be a problem there..
I can only confirm differences in implementation from the different channel
stacks I've seen.

Cheers,

C.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.digium.com/pipermail/asterisk-ss7/attachments/20060707/bd44950d/attachment.htm


More information about the asterisk-ss7 mailing list