[asterisk-users] Asterisk SIP deadlocks - update_provisional_keepalive

Duane Larson duane.larson at gmail.com
Wed Apr 3 22:15:02 CDT 2013


So it just happened again on both machines at the same time and I was
running debug on both servers.  I am running OpenSIPS and load balancing
between both servers so I am guessing when the invite was sent to the first
server it was frozen for some reason and then OpenSIPS sent the invite to
the second server and that server was also frozen/deadlocked because of the
SIP message.  I noticed on both servers the last log that was posted with
Asterisk deadlocked was the following


Asterisk version 11.0.1
[Apr  3 21:39:42] DEBUG[12984] res_timing_timerfd.c: Expected to
acknowledge 1 ticks but got 11805 instead

Asterisk version 11.2.1
[Apr  3 21:39:50] DEBUG[1854] res_timing_timerfd.c: Expected to acknowledge
1 ticks but got 12423 instead


In my last email I posted the debug from the Asterisk server with 11.0.1
version of code.  Here is a post of the debug for the Asterisk server with
version 11.2.1

http://pastebin.com/mbjSSAWM


This has to be a bug right?  I am thinking of opening an issue on the
Asterisk JIRA system



On Wed, Apr 3, 2013 at 4:45 PM, Duane Larson <duane.larson at gmail.com> wrote:

> It just happened again on the 11.0.1 box and I was able to grab a debug.
>  I am hoping someone can tell me if this is a bug or something wrong with
> my config.
>
> gdb asterisk-bin/sbin/asterisk 29048
>
> Go here for the debug output
> http://pastebin.com/DGXx0BSk
>
>
> On Tue, Apr 2, 2013 at 7:42 PM, Duane Larson <duane.larson at gmail.com>wrote:
>
>> I am currently running two different versions of Asterisk
>>
>> 11.0.1
>> 11.2.1
>>
>> I have noticed the bug occur on both servers.
>>
>> The issue is that when I try to dial a phone number sometimes the call
>> will never go out.  I will check the Asterisk server with NGREP and see
>> that the SIP messages are making it to Asterisk but Asterisk isn't
>> responding.
>>
>> I do the following command "netstat -nap |grep 5060" and see that
>> Asterisk has a lot under the "Recv-Q" column.
>>
>> It usually takes about 10 minutes before Asterisk becomes responsive
>> again or else before 10 minutes is up I could restart Asterisk and
>> everything will be back to normal.
>>
>> I see in the message logs the following errors
>>
>> On the 11.0.1 Asterisk server
>> WARNING[23723][C-00000010] chan_sip.c: Unable to cancel schedule ID
>> 11473.  This is probably a bug (chan_sip.c: update_provisional_keepalive,
>> line 4406).
>>
>> On the 11.2.1 Asterisk server
>> WARNING[3493][C-0000001f] chan_sip.c: Unable to cancel schedule ID 30810.
>>  This is probably a bug (chan_sip.c: update_provisional_keepalive, line
>> 4683).
>>
>>
>> When I look in chan_sip.c on both servers I see that they are the same
>> line of code
>>
>> AST_SCHED_DEL_UNREF(sched, pvt->provisional_keepalive_sched_id,
>> dialog_unref(pvt, "when you delete the provisional_keepalive_sched_id, you
>> should dec the refcount for the stored dialog ptr"));
>>
>>
>>
>> What could be causing this because it seems to happen at least once a day.
>>
>
>
>
> --
> --
> *--*--*--*--*--*
> Duane
> *--*--*--*--*--*
> --
>



-- 
--
*--*--*--*--*--*
Duane
*--*--*--*--*--*
--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-users/attachments/20130403/1525acb1/attachment.htm>


More information about the asterisk-users mailing list