[asterisk-dev] [Code Review] Prevent race condition that can cause ast_read to "miss" important frames

Joshua Colp reviewboard at asterisk.org
Thu Mar 1 07:38:10 CST 2012


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/1779/#review5690
-----------------------------------------------------------

Ship it!


After examining the code myself I completely agree with this fix. Very nice find!

- Joshua


On Feb. 29, 2012, 1:48 p.m., Mark Michelson wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/1779/
> -----------------------------------------------------------
> 
> (Updated Feb. 29, 2012, 1:48 p.m.)
> 
> 
> Review request for Asterisk Developers.
> 
> 
> Summary
> -------
> 
> There appears to be a potential race condition that can occur when a timingfd is in use on a channel. I'll use the referenced issue to illustrate what is occurring.
> 
> 1. A SIP channel with progressinband=yes places a call through Asterisk to another phone.
> 2. The destination phone starts ringing, resulting in Asterisk needing to generate a ringing tone on the inbound channel.
> 3. The caller hangs up his phone.
> 4. The SIP channel driver receives a CANCEL from the calling channel, and so it queues a hangup frame.
> 5. The timer for the tone generator happens to fire shortly after.
> 6. ast_waitfor_n() is called by app_dial.
> 7. ast_waitfor_n() says the timingfd for the caller channel has data to be read.
> 8. Whoever called ast_waitfor_n() then calls ast_read() on the winning channel.
> 9. ast_read() notices the channel is being hung up and deactivates the tone generator.
> 10. ast_read() sees that the channel's fdno is the AST_TIMING_FD.
> 11. Since the generator has been deactivated, there is no timing function to call, so the channel is unlocked and an ast_null_frame is returned.
> 
> This is where the problem happens. Since the calling party has hung up, it will never be the winner of any further ast_waitfor_n() calls. Furthermore, that hangup that was queued has been "lost" since there happened to be two sources of data ready at the same time. As far as I can tell, this problem will only occur if a timing fd is in use because otherwise the alertpipe will properly be read even if multiple sources trigger a read at the same time. Note that while SIP was used here, and the situation was canceling a ringing call, I suspect that any time that a timer is being actively used a hangup could be missed. This means that hanging up during file playback or hanging up an analog channel while ringing could have the same effect.
> 
> My solution to this problem is at step 9. In ast_settimeout(), if the rate is 0, the timingfunc is NULL, and the current chan->fdno is AST_TIMING_FD, then we set chan->fdno to -1. This allows us to bypass steps 10 and 11 above and move further into ast_read(), which will properly return NULL since the channel has been hung up.
> 
> I'm putting this on review board since I need to know for sure that this is safe to do. Hanging up is not the only thing that can deactivate a generator or cause ast_settimeout() to get called with a 0 rate and NULL timingfunc. As far as I know though, by setting the timeout to 0 like this, you are essentially attempting to say "I'm done with whatever timing-related task I was doing, so don't allow the AST_TIMING_FD to trigger reads any more." By clearing the channel's fdno like this, you are accomplishing this goal.
> 
> Please let me know if there are any potential pitfalls to this approach.
> 
> 
> This addresses bug ASTERISK-19223.
>     https://issues.asterisk.org/jira/browse/ASTERISK-19223
> 
> 
> Diffs
> -----
> 
>   /tags/1.8.8.1/channels/chan_sip.c 351448 
>   /tags/1.8.8.1/main/channel.c 351448 
> 
> Diff: https://reviewboard.asterisk.org/r/1779/diff
> 
> 
> Testing
> -------
> 
> I inserted some debug into the path that leads to the error occurring. I determined that without my patch, seeing that debug message print meant a missed hangup. With my patch, I saw the debug message print, but the hangup was not missed.
> 
> 
> Thanks,
> 
> Mark
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20120301/dc6ab774/attachment.htm>


More information about the asterisk-dev mailing list