[asterisk-bugs] [Asterisk 0010289]: Old LAGRQ frames showing up in new IAX2 calls

noreply at bugs.digium.com noreply at bugs.digium.com
Wed Aug 1 13:34:58 CDT 2007


A NOTE has been added to this issue. 
====================================================================== 
http://bugs.digium.com/view.php?id=10289 
====================================================================== 
Reported By:                mihai
Assigned To:                russell
====================================================================== 
Project:                    Asterisk
Issue ID:                   10289
Category:                   Channels/chan_iax2
Reproducibility:            always
Severity:                   minor
Priority:                   normal
Status:                     assigned
Asterisk Version:           1.4.8  
SVN Branch (only for SVN checkouts, not tarball releases): N/A  
SVN Revision (number only!):  
Disclaimer on File?:        N/A 
Request Review:              
====================================================================== 
Date Submitted:             07-24-2007 10:52 CDT
Last Modified:              08-01-2007 13:34 CDT
====================================================================== 
Summary:                    Old LAGRQ frames showing up in new IAX2 calls
Description: 
If there are two successive IAX2 calls having the same source and
destination call id, sometimes old LAGRQ frames belonging to the first call
are transmitted as part of a VNAK retransmission during the second call.
Since the old LAGRQ have wildly out of order sequence numbers, the other
endpoint will request retransmission, which can cause a VNAK storm.
I am able to reproduce this by stress testing chan_iax2 with an automated
script that generates about 3 calls per second, up to about 200
simultaneous calls.  This will pretty much guarantee that source ids will
be recycled on the server side, which can trigger this issue. 
====================================================================== 

---------------------------------------------------------------------- 
 mihai - 08-01-07 13:34  
---------------------------------------------------------------------- 
We (SteveK and I) believe we have found the root cause of this problem:

It all starts from a race condition between the moment a dynamic thread is
created and when it is used.  When schedule_action requests a thread, if
there are no available pooled threads, a new dynamic thread is created.
NOrmally, this thread will grab its mutex and then wait for a condition. 
Schedule action would set the thread callback and data and then signal the
thread to start running.

However, if the signal arrives before the thread has acquired its mutex,
it will ignored. The thread will go into sleep and will timeout after 30
seconds without executing its task. If the callback is attempt_transmit,
the frame is not retransmitted and, more importantly, is not freed. The
scheduler has no other chance to reschedule this event, so the frame stays
there forever.

Patch to follow soon... 

Issue History 
Date Modified   Username       Field                    Change               
====================================================================== 
08-01-07 13:34  mihai          Note Added: 0068227                          
======================================================================




More information about the asterisk-bugs mailing list