[asterisk-dev] [Code Review] Fixes chan_iax2 race condition

David Vossel dvossel at digium.com
Wed Jun 16 11:20:31 CDT 2010

This is an automatically generated e-mail. To reply, visit:

Review request for Asterisk Developers.


There is code in chan_iax2.c that attempts to guarantee that only a single active thread will handle a call number at a time.  This code works once the thread is added to an active_list of threads, but we are not currently guaranteed that a newly activated thread will enter the active_list immediately because it is left up to the thread to add itself after frames have been queued to it.  This means that if two frames come in for the same call number at the same time, it is possible for them to grab two separate threads because the first thread did not add itself to the active_list fast enough.  This causes some pretty complex problems.

This patch resolves this race condition by immediately adding an activated thread to the active_list within the network thread and only depending on the thread to remove itself once it is done processing the frames queued to it.  By doing this we are guaranteed that if another frame for the same call number comes in at the same time, that this thread will immediately be found in the active_list of threads.


  /branches/1.4/channels/chan_iax2.c 270834 

Diff: https://reviewboard.asterisk.org/r/720/diff


The particular problem I was encountering involved registrations when the far end has the  'delayreject' option enabled.  That option ACK's a REGREQ and immediately sends a REGAUTH response at the same time.  Both the ACK and the REGAUTH response were grabbing separate threads and caused a serious problem with finding the call number. There is a point in the code in find_callno() were an iax_pvt is added to a container of iax_pvts once the destination call number is provided.  Because of the way this code works, there is a brief block of code that makes it impossible for the find_callno() to return the call number for a second thread if two threads are processing the callno at the same time.  This causes registrations to fail in a way that is not recoverable for minute or two.

I tested with about 5000 registrations and was not able to reproduce this race condition after this patch where the issue usually appeared in the first 100-500 registrations.



More information about the asterisk-dev mailing list