[asterisk-users] What would cause a drop between two asterisk systems?

Tue Mar 5 13:32:58 CST 2013

We have an asterisk frontend terminating all our SIP phones to, and an
asterisk backend with a wildcard PRI card in it connecting to the PTSN.
The frontend handles 99% of dialplan logic and just hands off anything
outgoing to the backend via IAX2, which dials out on one of the open
channels.

Lately we've been getting a disconnected calls. Keeping the consoles
running it doesn't seem to be the PRI initiating the hangups, as I'll
when I see hangups intiiated on the backend / PRI side:

  -- Span 2: Channel 0/21 got hangup request, cause 16

Instead, I'm seeing 

 == Spawn extension (outbound, (dialed #), 3) exited non-zero on 'IAX2/asterisk-frontend2-603'
    -- Hungup 'IAX2/asterisk-frontend2-603'

Which indicates the frontend initiated a hangup. But on the frontend I'm
seeing auto fallthroughs to the h extension, which only happens if the
hangup is initiated from the backend:

    -- Auto fallthrough, channel 'SIP/phone1-00000167' status is 'ANSWER'
    (h extension stuff follows)

If that side was initiating the hangup, I'd just see a jump to the h
extension, with no auto fallthrough. So it looks like there may be a
communication interruption between the front and backends.

The problem is this happens intermittently, so I can't reproduce it
reliably. I've held open a call for 30+ minutes and not run into the
problem, while someone's been on a call for 7 minutes and this happens.
It doesn't seem feasible to constantly run IAX2 debugs from the console
on any open call - does anyone have suggestions on how to troubleshoot
this? Weirdly enough, this only seems to happen when users dial into
conference bridges (not local) such as WebEx and GoToMeeting, but that
might just be because of the length of those calls. 

Will tweaking things like the IAX2 jitter buffer help? The two systems 
are barely four hops apart with an average of .2 ms ping times between
them on a very resilient network (two of those hops are through core
transports). I've never seen ping loss between them, even when running
ping tests for hours during heavy call volume periods. The loads on the
machines are minimal - never seen the load go above .10 during normal
operation. But it does seem like something between them is making them
drop calls. 

hose