[asterisk-bugs] [JIRA] (ASTERISK-21387) Asterisk Deadlocks at least once during the day

Duane Larson (JIRA) noreply at issues.asterisk.org
Sat Apr 6 15:43:01 CDT 2013


Duane Larson created ASTERISK-21387:
---------------------------------------

             Summary: Asterisk Deadlocks at least once during the day
                 Key: ASTERISK-21387
                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-21387
             Project: Asterisk
          Issue Type: Bug
      Security Level: None
          Components: Channels/chan_sip/General
    Affects Versions: 11.3.0, 11.2.1, 11.0.1
         Environment: Debian 6.0.4
            Reporter: Duane Larson


I have tested on three different versions of Asterisk and I still have the same issue.  It appears to happen once or twice a day.  It appears to happen when a VoIP phone trys to make a call (SIP Invite).  I have two Asterisk servers that are both behind an OpenSIPS server that load balances between the two servers.  Sometimes a SIP INVITE will make only one server have a deadlock and the call can go out the second server, but most of the time both Asterisk servers will deadlock and calls cannot be made.  When I am notified that there is an issue I get on the box and execute "netstat -nap |grep 5060" add see that "Recv-Q" column value is building up.

I've posted on the Mailing list and was told that version 11.3 fixed a lot of deadlock issues but the upgrade did not help.
See Mailing List email here 
http://lists.digium.com/pipermail/asterisk-users/2013-April/278436.html

It usually takes about 10 minutes before Asterisk becomes responsive again or else before 10 minutes is up I could restart Asterisk and everything will be back to normal.

I see in the message logs the following errors

On the 11.0.1 Asterisk server
WARNING[23723][C-00000010] chan_sip.c: Unable to cancel schedule ID 11473.  This is probably a bug (chan_sip.c: update_provisional_keepalive, line 4406).

On the 11.2.1 Asterisk server
WARNING[3493][C-0000001f] chan_sip.c: Unable to cancel schedule ID 30810.  This is probably a bug (chan_sip.c: update_provisional_keepalive, line 4683).


When I look in chan_sip.c on both servers I see that they are the same line of code

AST_SCHED_DEL_UNREF(sched, pvt->provisional_keepalive_sched_id, dialog_unref(pvt, "when you delete the provisional_keepalive_sched_id, you should dec the refcount for the stored dialog ptr"));


I've seen the following debug logs

Asterisk version 11.0.1
[Apr  3 21:39:42] DEBUG[12984] res_timing_timerfd.c: Expected to acknowledge 1 ticks but got 11805 instead

Asterisk version 11.2.1
[Apr  3 21:39:50] DEBUG[1854] res_timing_timerfd.c: Expected to acknowledge 1 ticks but got 12423 instead


Here is a gdb debug I grabbed when the issue occurred with version 11.3.0

http://pastebin.com/gd291Bqz

I am thinking this is a bug since it happens randomly.  Any help is appreciated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.asterisk.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



More information about the asterisk-bugs mailing list