<div dir="ltr">So it just happened again on both machines at the same time and I was running debug on both servers. I am running OpenSIPS and load balancing between both servers so I am guessing when the invite was sent to the first server it was frozen for some reason and then OpenSIPS sent the invite to the second server and that server was also frozen/deadlocked because of the SIP message. I noticed on both servers the last log that was posted with Asterisk deadlocked was the following<div>
<br></div><div><br></div><div style>Asterisk version 11.0.1</div><div>[Apr 3 21:39:42] DEBUG[12984] res_timing_timerfd.c: Expected to acknowledge 1 ticks but got 11805 instead<br></div><div><br></div><div style>Asterisk version 11.2.1</div>
<div>[Apr 3 21:39:50] DEBUG[1854] res_timing_timerfd.c: Expected to acknowledge 1 ticks but got 12423 instead<br></div><div><br></div><div><br></div><div style>In my last email I posted the debug from the Asterisk server with 11.0.1 version of code. Here is a post of the debug for the Asterisk server with version 11.2.1</div>
<div style><br></div><div style><a href="http://pastebin.com/mbjSSAWM">http://pastebin.com/mbjSSAWM</a><br></div><div style><br></div><div style><br></div><div style>This has to be a bug right? I am thinking of opening an issue on the Asterisk JIRA system</div>
<div><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Wed, Apr 3, 2013 at 4:45 PM, Duane Larson <span dir="ltr"><<a href="mailto:duane.larson@gmail.com" target="_blank">duane.larson@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="im"><span style="font-family:arial,sans-serif;font-size:13px">It just happened again on the 11.0.1 box and I was able to grab a debug. I am hoping someone can tell me if this is a bug or something wrong with my config.</span><div style="font-family:arial,sans-serif;font-size:13px">
<br></div><div style="font-family:arial,sans-serif;font-size:13px">gdb asterisk-bin/sbin/asterisk 29048</div><div style="font-family:arial,sans-serif;font-size:13px"><br></div></div><div>Go here for the debug output</div>
<div><a href="http://pastebin.com/DGXx0BSk" target="_blank">http://pastebin.com/DGXx0BSk</a><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote"><div class="im">On Tue, Apr 2, 2013 at 7:42 PM, Duane Larson <span dir="ltr"><<a href="mailto:duane.larson@gmail.com" target="_blank">duane.larson@gmail.com</a>></span> wrote:<br>
</div><div><div class="h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I am currently running two different versions of Asterisk<div><br></div><div>
11.0.1</div><div>11.2.1</div>
<div><br></div><div>I have noticed the bug occur on both servers.</div><div><br></div><div>
The issue is that when I try to dial a phone number sometimes the call will never go out. I will check the Asterisk server with NGREP and see that the SIP messages are making it to Asterisk but Asterisk isn't responding. </div>
<div><br></div><div>I do the following command "netstat -nap |grep 5060" and see that Asterisk has a lot under the "Recv-Q" column.</div><div><br></div><div>It usually takes about 10 minutes before Asterisk becomes responsive again or else before 10 minutes is up I could restart Asterisk and everything will be back to normal.</div>
<div><br></div><div>I see in the message logs the following errors</div><div><br></div><div>On the 11.0.1 Asterisk server</div><div>WARNING[23723][C-00000010] chan_sip.c: Unable to cancel schedule ID 11473. This is probably a bug (chan_sip.c: update_provisional_keepalive, line 4406).<br>
</div><div><br></div><div>On the 11.2.1 Asterisk server</div><div>WARNING[3493][C-0000001f] chan_sip.c: Unable to cancel schedule ID 30810. This is probably a bug (chan_sip.c: update_provisional_keepalive, line 4683).<br>
</div><div><br></div><div><br></div><div>When I look in chan_sip.c on both servers I see that they are the same line of code</div><div><br></div><div>AST_SCHED_DEL_UNREF(sched, pvt->provisional_keepalive_sched_id, dialog_unref(pvt, "when you delete the provisional_keepalive_sched_id, you should dec the refcount for the stored dialog ptr"));<br>
</div><div><br></div><div><br></div><div><br></div><div>What could be causing this because it seems to happen at least once a day.</div></div>
</blockquote></div></div></div><br><br clear="all"><div class="im"><div><br></div>-- <br>--<br>*--*--*--*--*--*<br>Duane<br>*--*--*--*--*--*<br>--
</div></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br>--<br>*--*--*--*--*--*<br>Duane<br>*--*--*--*--*--*<br>--
</div>