[asterisk-dev] iax2_sched_replace: Unable to cancel schedule

Timo Teräs timo.teras at iki.fi
Fri May 2 02:57:44 CDT 2008


Tilghman Lesher wrote:
> On Friday 02 May 2008 00:07:39 Timo Teräs wrote:
>> Timo Teräs wrote:
>>> I'm getting log messages such as:
>>>
>>> [Apr 30 11:29:48] WARNING[15015]: chan_iax2.c:1103 iax2_sched_replace:
>>> Unable to cancel schedule ID 8973.  This is probably a bug (chan_iax2.c:
>>> iax2_sched_replace, line 1103). [Apr 30 11:29:55] WARNING[15015]:
>>> chan_iax2.c:1103 iax2_sched_replace: Unable to cancel schedule ID 9336. 
>>> This is probably a bug (chan_iax2.c: iax2_sched_replace, line 1103). [Apr
>>> 30 11:29:56] WARNING[15015]: chan_iax2.c:1103 iax2_sched_replace: Unable
>>> to cancel schedule ID 9487.  This is probably a bug (chan_iax2.c:
>>> iax2_sched_replace, line 1103).
>>>
>>> on Asterisk 1.6.0-beta8 (with connectedline patches).
>>>
>>> Any suggestions how to debug those?
>> Ok. I think I got it. I get these on a slow machine. So I think the problem
>> is the usage of AST_SCHED_REPLACE() in iax2_sched_replace(). Looking at the
>> comments in sched.h ASC_SCHED_REPLACE() should be used only for entities
>> where the scheduler callback returns non-zero to get injected back to
>> scheduler queue with same id. It looks like to me that all scheduler
>> callbacks return zero so usage of this macro is incorrect (the macro
>> assumes that the ID is always valid; and if it's not then the scheduler
>> callback is being run in another thread and reinjected soon; so it loops a
>> couple of times trying to delete it again). Doing
>> s/AST_SCHED_REPLACE/ast_sched_replace/ in chan_iax2.c cured the problem for
>> me. Now I'm wondering why ast_sched_replace is marked deprecated in
>> sched.h? Clearly it has valid usages where AST_SCHED_REPLACE is not
>> appropriate
> 
> That comment is in the process of being revised, because it was based on a
> fundamental misunderstanding of how the code works.  There is indeed a bug
> in chan_iax2 where a scheduled process either does not reset its ID upon death
> or it's hanging out for WAY too long.
> 
> Setting the routine back to the function instead of the macro doesn't "fix"
> anything; it merely hides the problem.

Right. Okay. Any suggestions for debugging? I'm competent programmer/familiar
with gdb (though my test box has uclibc+linuxthreads+some hardened stuff so
not all gdb features work properly), so just give a some kind of list of things
I should check and I can find them out. My test platform is oldish PC so that's
probably triggering the "WAY too long" thingy. I did find out that the related
schedule function is jitter buffer. Turning jitterbuffer off also makes the
warnings go away.

It might be related to the fact handling incoming media frames need to do
way too many mutex locks in __find_callno() as I see lots of threads in that
function. Why isn't there a iaxd[IAX_MAX_CALLS] indexing the calls by the
other systems call number (those whould have to be lists as there might multiple
remotes with same call number) or hashed by (address/port/remote callno).
Look up like that would be way faster compared to the current implementation.

Cheers,
  Timo



More information about the asterisk-dev mailing list