[asterisk-dev] 1.4/trunk chan_iax2.c stability/deadlocks
Mihai Balea
mihai at hates.ms
Fri May 4 09:09:35 MST 2007
Hi
I have just reported a race condition to mantis, here's the link:
http://bugs.digium.com/view.php?id=9666
I'm attaching the description of the problem below, copied from the
bug report
Mihai
We have experienced a series of random crashes on our production
systems, especially when operating under load or under poor network
conditions. By investigating the core dumps, we found that all
crashes were caused by a segfault in chan_iax2.c, in function
__attempt_transmit(), line 1845 (official asterisk-1.4.4 tarball).
The relevant code looks like this:
/* Hangup the fd */
fr.frametype = AST_FRAME_CONTROL;
fr.subclass = AST_CONTROL_HANGUP;
iax2_queue_frame(callno, &fr);
/* Remember, owner could disappear */
if (iaxs[callno]->owner)
iaxs[callno]->owner->hangupcause =
AST_CAUSE_DESTINATION_OUT_OF_ORDER;
This code is supposed to be executed with the call mutex locked (iaxsl
[callno]). However, you will notice that two lines before the if,
there's a call to iax2_queue_frame(). This function will release the
lock for a short period of time in an attempt to prevent a deadlock.
If another thread grabs the lock, it can call iax2_destroy, thus
NULLing the entry in the iaxs array.
There are several other areas in the code where iax2_queue_frame() is
called which are also potential crash spots - however, for some
reason, all our crashes happened in only one place, as described above.
We have a patch that attempts to fix this hole as well as several
others. I am not sure that it is the correct way of fixing the
problem since it addresses the effects and not the cause. Will post
it after we test it a little bit.
On May 3, 2007, at 3:59 PM, Stephen Davies wrote:
> Hi,
>
> I recently moved our IAX service servers on to SVN trunk.
>
> Seems to me that a lot of people are still on 1.2, and so I thought I
> should do my bit and put the trunk code into production and see what
> happens and fix whatever comes my way. Cos we need 1.4 to be stable.
>
> So what has happened is segfaults and deadlocks in chan_iax2.
> Probably on average once a day. Of course this is to do with the new
> multi-threaded stuff in there.
>
> Is my experience the norm for those using iax2 on 1.4/trunk?
>
> So I've been working on my coredumps and fixing the issues - I'll
> upload onto Mantis once I've seen whether my fixes are proving
> effective.
>
> Are others running 1.4 in iax intensive environments? Are there
> others prepared to take some pain to try to chase down these issues?
>
> Thanks,
> Steve
> _______________________________________________
> --Bandwidth and Colocation provided by Easynews.com --
>
> asterisk-dev mailing list
> To UNSUBSCRIBE or update options visit:
> http://lists.digium.com/mailman/listinfo/asterisk-dev
>
More information about the asterisk-dev
mailing list