[asterisk-users] Call drop weirdness

Chris Nighswonger cnighswonger at foundations.edu
Sat Nov 10 07:40:44 CST 2012


On Wed, Oct 31, 2012 at 10:31 AM, Chris Nighswonger
<cnighswonger at foundations.edu> wrote:

    >> I'm running Asterisk 10.7.0 with three sip trunks to my call termination
    >> provider. For the most part everything works great.
    >
    >>However, at apparently random times and usually about 20 mins or so into
    >> the call, the outbound audio stream dies.
    >>The call stays connected and the inbound audio works fine.

    >So I've been watching this problem and was finally able to get a pcap
    >while it happened.


<snip>


   > Any thoughts on what might be going wrong? Do I need to post more
    >info? Or am I on the wrong track altogether?


After lots of grinding through traces and data dumps both on my end
and my provider's, it turned out I was on the wrong track altogether.

I finally threw together a script to log counter stats from the
switchport into which our pbx is plugged, in spite of no noticeable
counter activity. From this I found that the port was accumulating
align errors at very slow rate; more like small bursts. So wrote a
script to log this counter to an RRD and added it to a graph of
traffic control rates. This allowed me to associate the bursts of
align errors with RTP data flow.

The graph here (http://www.screencast.com/t/vMsi3gVke4) contains two
bursts of align errors. The first walked all over a call resulting in
the outbound RTP stream dropping. As soon as the errors stopped the
audio picked back up. The second burst correlates with log entries
like this (no calls were placed or received during this burst):

[2012-11-09 14:23:11] NOTICE[4199] chan_sip.c: Peer
'didforsale_outbound' is now UNREACHABLE!  Last qualify: 84
[2012-11-09 14:23:13] NOTICE[4199] chan_sip.c: Peer 'didforsale_did'
is now UNREACHABLE!  Last qualify: 84

Interestingly enough, long ping sequences with large packet payloads
do not seem to trigger any errors.

Having changed cables, ports, as well as for duplex and speed
mismatches, the only remaining hardware to be checked is the NIC,
which I suspect is bad. So I'm going to switch over to our backup pbx
and test that theory.

I apologize for the lengthy explanation, but perhaps it will help some
other person with a similarly maddening problem.

Kind Regards,
Chris



More information about the asterisk-users mailing list