[Asterisk-Users] why a perfectly fine iax2 host becomes UNREACHABLE?

Andrew Kohlsmith akohlsmith-asterisk at benshaw.com
Thu May 4 14:44:56 MST 2006


On Thursday 04 May 2006 17:24, Tom Engleward wrote:
> You said I should help rather than whine. Am I not
> helping by announcing that I, too, am experiencing a
> problem which somebody else has mentioned, thus
> providing verification that the problem isn't just an
> isolated quirk caused by somebody's particular
> incorrect configuration?

That was helpful.  The rant definitely was not.

> Besides all this, I still think my suggestion is valid
> that consideration be given to adding a cron job to
> asterisk to periodically place an incoming call and
> issue a "reload" if the call fails. It's standard in
> the software industry to have such watchdog timers on
> mission-critical software.

I disagree -- I design mission-critical hardware and software for a living 
(industrial motion controllers: soft starters, variable-frequency drives, 
etc.) -- YES there are watchdog timers there for various things but if one 
ever trips off the system is torn apart to discover WHY and make sure it does 
not happen again if it was something avoidable.  Restarting a PBX is not 
something I would ever consider acceptable.  It's like that "wctdm is going 
squirrely, I just reboot it once a night to keep it from happenning" -- no 
that's not acceptable.  We need the problem labbed up and testing done to 
determine the root cause and eliminate it, not do the old Windows trick of 
"reboot and see if that fixes it" -- when you reboot (or restart in this 
case) you waste time and flush away any data you could have gathered.

My suggestion for you, if possible, is to turn on ethereal or tcpdump on one 
of the afflicted machines and let it run during the quiet times, or during 
the times it's most likely to happen (if you can stand the size of the data 
dumps) and see if you can see one box simply not replying to IAX2 PING 
messages.  

If you can't handle the data dump, add some debugging messages to both boxes' 
chan_iax2.c that prints a timestamp along with "SENDING PING" "RECEIVED PING" 
and "SENDING PONG" -- maybe it's chan_iax2, maybe it's the kernel dropping 
the packet (doubtful) and maybe it's the network dropping the packet (again 
doubtful on a LAN)...  but without data it's impossible to tell and 
impossible to fix.

> If you're inclined to respond "ok, then just add this
> feature yourself," please say so only if you happen to
> agree that such a feature would actually be a good
> idea. In other words, I'm including a question in this

No I do not believe a nightly reboot is EVER a good answer, and it's never 
ever a solution.  It's a band-aid.  It's like taking tylenol every time you 
have a headache, and never trying to figure out why you have a headache every 
single day.

> message: "would it be a good idea to add a watchdog to
> reload iax2 when it fails?"

IMO, no, for the reasons above.

-A.



More information about the asterisk-users mailing list