[asterisk-dev] 1.4/trunk chan_iax2.c stability/deadlocks

Mihai Balea mihai at hates.ms
Fri May 4 09:09:35 MST 2007


Hi

I have just reported a race condition to mantis, here's the link:

http://bugs.digium.com/view.php?id=9666

I'm attaching the description of the problem below, copied from the  
bug report

Mihai


We have experienced a series of random crashes on our production  
systems, especially when operating under load or under poor network  
conditions. By investigating the core dumps, we found that all  
crashes were caused by a segfault in chan_iax2.c, in function  
__attempt_transmit(), line 1845 (official asterisk-1.4.4 tarball).  
The relevant code looks like this:

     /* Hangup the fd */
     fr.frametype = AST_FRAME_CONTROL;
     fr.subclass = AST_CONTROL_HANGUP;
     iax2_queue_frame(callno, &fr);
     /* Remember, owner could disappear */
     if (iaxs[callno]->owner)
         iaxs[callno]->owner->hangupcause =  
AST_CAUSE_DESTINATION_OUT_OF_ORDER;

This code is supposed to be executed with the call mutex locked (iaxsl 
[callno]). However, you will notice that two lines before the if,  
there's a call to iax2_queue_frame(). This function will release the  
lock for a short period of time in an attempt to prevent a deadlock.  
If another thread grabs the lock, it can call iax2_destroy, thus  
NULLing the entry in the iaxs array.

There are several other areas in the code where iax2_queue_frame() is  
called which are also potential crash spots - however, for some  
reason, all our crashes happened in only one place, as described above.

We have a patch that attempts to fix this hole as well as several  
others. I am not sure that it is the correct way of fixing the  
problem since it addresses the effects and not the cause. Will post  
it after we test it a little bit.


On May 3, 2007, at 3:59 PM, Stephen Davies wrote:

> Hi,
>
> I recently moved our IAX service servers on to SVN trunk.
>
> Seems to me that a lot of people are still on 1.2, and so I thought I
> should do my bit and put the trunk code into production and see what
> happens and fix whatever comes my way.  Cos we need 1.4 to be stable.
>
> So what has happened is segfaults and deadlocks in chan_iax2.
> Probably on average once a day.  Of course this is to do with the new
> multi-threaded stuff in there.
>
> Is my experience the norm for those using iax2 on 1.4/trunk?
>
> So I've been working on my coredumps and fixing the issues - I'll
> upload onto Mantis once I've seen whether my fixes are proving
> effective.
>
> Are others running 1.4 in iax intensive environments?  Are there
> others prepared to take some pain to try to chase down these issues?
>
> Thanks,
> Steve
> _______________________________________________
> --Bandwidth and Colocation provided by Easynews.com --
>
> asterisk-dev mailing list
> To UNSUBSCRIBE or update options visit:
>   http://lists.digium.com/mailman/listinfo/asterisk-dev
>



More information about the asterisk-dev mailing list