See message below - sending from gmail since my original post didn't make it through (spam filter ate it?).<br><br>---------- Forwarded message ----------<br><span class="gmail_quote">From: <b class="gmail_sendername">
Tim Robbins</b> <<a href="mailto:tim@tjr.id.au">tim@tjr.id.au</a>><br>Date: May 5, 2007 9:14 AM<br>Subject: Fw: [asterisk-dev] 1.4/trunk chan_iax2.c stability/deadlocks<br>To: <a href="mailto:tjrobbins@gmail.com">tjrobbins@gmail.com
</a><br><br></span><br>----- Original Message -----<br>From: "Tim Robbins" <<a href="mailto:tim@tjr.id.au">tim@tjr.id.au</a>><br>To: "Asterisk Developers Mailing List" <<a href="mailto:asterisk-dev@lists.digium.com">
asterisk-dev@lists.digium.com</a>><br>Sent: Friday, May 04, 2007 6:54 PM<br>Subject: Re: [asterisk-dev] 1.4/trunk chan_iax2.c stability/deadlocks<br><br><br>> We recently attempted the jump from 1.0 to 1.4 on our production systems
<br>> and were seeing IAX2-related crashes every couple of minutes until we<br>> rolled back. After some debugging efforts and fairly brutal load tests,<br>> we're hopefully ready to try again some time soon.
<br>><br>> The issues we found and fixed were:<br>> - Null pointer dereferences when racing with another thread trying to<br>> destroy the tech-private data in a few places. This happens when the<br>> iaxsl[] mutex is released then reacquired without checking that the call
<br>> is still there (iaxs[x] != NULL). We only saw it happen in iax2_indicate()<br>> and function_iaxpeer() ("CURRENTCHANNEL" case), but there are theoretical<br>> problems in iax2_bridge() and socket_process() (IAX_COMMAND_NEW "TBD"
<br>> case). Everywhere else in chan_iax2.c either (a) checks that the pvt is<br>> non-null after acquiring the lock, or (b) is fairly safe since it's<br>> working on a newly-allocated call. This is part of Mantis bug #9084.
<br>> - ast_channel_free() tries to free tech_pvt pointer in some error cases,<br>> but tech_pvt is not a heap pointer for IAX2 channels (it's the callno cast<br>> to a void *). I don't know exactly what triggers these error cases. This
<br>> is Mantis bug #9103.<br>> - Apparent use-after-free bug in prune_peers() - destroy_peer() /<br>> AST_LIST_REMOVE_CURRENT() in the wrong order. Only likely to happen with<br>> rtautoclear=yes.<br>> - rtautoclear=yes only removes peers, not users. This might result in the
<br>> users list growing too large.<br>><br>> Other things we noticed:<br>> - find_callno() is really slow, especially when Realtime is being used<br>> with a relatively slow backend (e.g. MySQL). If one of the IAX worker
<br>> threads sleeps with an iaxsl[] mutex held while waiting for the backend to<br>> respond, the other threads get stuck waiting to acquire that mutex.<br>> Russell's hashing code will mitigate this once the issues are sorted out.
<br>> It's likely that the code is perfectly fine but exposes race conditions<br>> that wouldn't have been possible when threads were being effectively<br>> serialized by find_callno().<br>><br>> Unfortunately I'm not able to contribute patches at this stage, but these
<br>> issues are fairly easy to fix.<br>><br>> Tim<br>><br>> ----- Original Message -----<br>> From: "Stephen Davies" <<a href="mailto:stephen.l.davies@gmail.com">stephen.l.davies@gmail.com</a>
><br>> To: "Asterisk Developers Mailing List" <<a href="mailto:asterisk-dev@lists.digium.com">asterisk-dev@lists.digium.com</a>><br>> Sent: Friday, May 04, 2007 5:59 AM<br>> Subject: [asterisk-dev]
1.4/trunk chan_iax2.c stability/deadlocks<br>><br>><br>>> Hi,<br>>><br>>> I recently moved our IAX service servers on to SVN trunk.<br>>><br>>> Seems to me that a lot of people are still on
1.2, and so I thought I<br>>> should do my bit and put the trunk code into production and see what<br>>> happens and fix whatever comes my way. Cos we need 1.4 to be stable.<br>>><br>>> So what has happened is segfaults and deadlocks in chan_iax2.
<br>>> Probably on average once a day. Of course this is to do with the new<br>>> multi-threaded stuff in there.<br>>><br>>> Is my experience the norm for those using iax2 on 1.4/trunk?<br>>>
<br>>> So I've been working on my coredumps and fixing the issues - I'll<br>>> upload onto Mantis once I've seen whether my fixes are proving<br>>> effective.<br>>><br>>> Are others running
1.4 in iax intensive environments? Are there<br>>> others prepared to take some pain to try to chase down these issues?<br>>><br>>> Thanks,<br>>> Steve<br>>> _______________________________________________
<br>>> --Bandwidth and Colocation provided by <a href="http://Easynews.com">Easynews.com</a> --<br>>><br>>> asterisk-dev mailing list<br>>> To UNSUBSCRIBE or update options visit:<br>>> <a href="http://lists.digium.com/mailman/listinfo/asterisk-dev">
http://lists.digium.com/mailman/listinfo/asterisk-dev</a><br>><br><br>