<div dir="ltr"><div class="gmail_extra">Hi,</div><div class="gmail_extra"><br></div><div class="gmail_extra">Thanks for looking at this.<br><br><div class="gmail_quote">On 29 January 2014 16:34, Matthew Jordan <span dir="ltr"><<a href="mailto:mjordan@digium.com" target="_blank">mjordan@digium.com</a>></span> wrote:<br>
<div>[snip] </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class=""><div class="h5"></div></div>
You'll note that the refcall is only called if ast_sched_del<br>
(eventually) returns a valid ID that it deleted.</blockquote><div><br></div><div>Agreed. In my proposed "failure" example, ast_sched_del is called first, and is called in many places in chan_sip with an unref function provided. As a result, ast_sched_del will indeed decrease the refcount. The <span style="font-family:arial,sans-serif;font-size:13px">ASTERISK-22079 backtrace refers to a keepalive timer that will be cancelled more often than it is needed in most environments, so this cancel code should execute regularly.</span></div>
<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> In general, chan_sip<br>
doesn't let the scheduler re-schedule events itself, which means that<br>
if the scheduler fires the event, then only the event callback should<br>
un-ref the object (thus consuming the reference of the ao2 object when<br>
it was put on the scheduler). If the scheduler doesn't fire the event<br>
and the delete call wins the lock contention, then the delete call<br>
should unref the object, and the event should never fire.<br></blockquote><div><br></div><div>If the scheduler wins, it does not know whether to delete the scheduled job from the queue until execution ends because the re-scheduling is determined by the return value from the event-call. If the event call is blocked waiting for a lock, then there could be plenty of time for ast_sched_del to get called from elsewhere between the start and end of the event execution.</div>
<div><br></div><div>The specific example in <span style="font-family:arial,sans-serif;font-size:13px">ASTERISK-22079 might be fixed by checking whether the pvt->keepaliveschedid (variable name is from memory) has changed either side of grabbing the pvt and chan locks in the event. If it has changed, bail. If that change is right, it should also allow for the big "#if 0 ..TODO" block in the keepalive code to be reactivated. Perhaps I'll try that and put some highly verbose logging in there to indicate what is happening in real life?</span></div>
<div><br></div><div>Thanks again,</div><div>Steve</div></div></div></div>