Fw: [asterisk-dev] 1.4/trunk chan_iax2.c stability/deadlocks

Tim Robbins tjrobbins at gmail.com
Fri May 4 16:17:44 MST 2007


See message below - sending from gmail since my original post didn't make it
through (spam filter ate it?).

---------- Forwarded message ----------
From: Tim Robbins <tim at tjr.id.au>
Date: May 5, 2007 9:14 AM
Subject: Fw: [asterisk-dev] 1.4/trunk chan_iax2.c stability/deadlocks
To: tjrobbins at gmail.com


----- Original Message -----
From: "Tim Robbins" <tim at tjr.id.au>
To: "Asterisk Developers Mailing List" <asterisk-dev at lists.digium.com>
Sent: Friday, May 04, 2007 6:54 PM
Subject: Re: [asterisk-dev] 1.4/trunk chan_iax2.c stability/deadlocks


> We recently attempted the jump from 1.0 to 1.4 on our production systems
> and were seeing IAX2-related crashes every couple of minutes until we
> rolled back. After some debugging efforts and fairly brutal load tests,
> we're hopefully ready to try again some time soon.
>
> The issues we found and fixed were:
> - Null pointer dereferences when racing with another thread trying to
> destroy the tech-private data in a few places. This happens when the
> iaxsl[] mutex is released then reacquired without checking that the call
> is still there (iaxs[x] != NULL). We only saw it happen in iax2_indicate()
> and function_iaxpeer() ("CURRENTCHANNEL" case), but there are theoretical
> problems in iax2_bridge() and socket_process() (IAX_COMMAND_NEW "TBD"
> case). Everywhere else in chan_iax2.c either (a) checks that the pvt is
> non-null after acquiring the lock, or (b) is fairly safe since it's
> working on a newly-allocated call. This is part of Mantis bug #9084.
> - ast_channel_free() tries to free tech_pvt pointer in some error cases,
> but tech_pvt is not a heap pointer for IAX2 channels (it's the callno cast
> to a void *). I don't know exactly what triggers these error cases. This
> is Mantis bug #9103.
> - Apparent use-after-free bug in prune_peers() - destroy_peer() /
> AST_LIST_REMOVE_CURRENT() in the wrong order. Only likely to happen with
> rtautoclear=yes.
> - rtautoclear=yes only removes peers, not users. This might result in the
> users list growing too large.
>
> Other things we noticed:
> - find_callno() is really slow, especially when Realtime is being used
> with a relatively slow backend (e.g. MySQL). If one of the IAX worker
> threads sleeps with an iaxsl[] mutex held while waiting for the backend to
> respond, the other threads get stuck waiting to acquire that mutex.
> Russell's hashing code will mitigate this once the issues are sorted out.
> It's likely that the code is perfectly fine but exposes race conditions
> that wouldn't have been possible when threads were being effectively
> serialized by find_callno().
>
> Unfortunately I'm not able to contribute patches at this stage, but these
> issues are fairly easy to fix.
>
> Tim
>
> ----- Original Message -----
> From: "Stephen Davies" <stephen.l.davies at gmail.com>
> To: "Asterisk Developers Mailing List" <asterisk-dev at lists.digium.com>
> Sent: Friday, May 04, 2007 5:59 AM
> Subject: [asterisk-dev] 1.4/trunk chan_iax2.c stability/deadlocks
>
>
>> Hi,
>>
>> I recently moved our IAX service servers on to SVN trunk.
>>
>> Seems to me that a lot of people are still on 1.2, and so I thought I
>> should do my bit and put the trunk code into production and see what
>> happens and fix whatever comes my way.  Cos we need 1.4 to be stable.
>>
>> So what has happened is segfaults and deadlocks in chan_iax2.
>> Probably on average once a day.  Of course this is to do with the new
>> multi-threaded stuff in there.
>>
>> Is my experience the norm for those using iax2 on 1.4/trunk?
>>
>> So I've been working on my coredumps and fixing the issues - I'll
>> upload onto Mantis once I've seen whether my fixes are proving
>> effective.
>>
>> Are others running 1.4 in iax intensive environments?  Are there
>> others prepared to take some pain to try to chase down these issues?
>>
>> Thanks,
>> Steve
>> _______________________________________________
>> --Bandwidth and Colocation provided by Easynews.com --
>>
>> asterisk-dev mailing list
>> To UNSUBSCRIBE or update options visit:
>>   http://lists.digium.com/mailman/listinfo/asterisk-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.digium.com/pipermail/asterisk-dev/attachments/20070504/4901b282/attachment.htm


More information about the asterisk-dev mailing list