[asterisk-dev] [Code Review]: Fix crashes in ast_rtcp_write()

Russell Bryant reviewboard at asterisk.org
Mon Sep 19 07:38:15 CDT 2011



> On Sept. 19, 2011, 12:01 a.m., David Vossel wrote:
> > /branches/1.8/res/res_rtp_asterisk.c, lines 2621-2626
> > <https://reviewboard.asterisk.org/r/1444/diff/1/?file=20703#file20703line2621>
> >
> >     There are macros for handling the ref counting of stuff associated with a scheduled event.

Yeah ... I avoided it because AST_SCHED_DEL_UNREF() is even more evil than AST_SCHED_DEL().  It does the "loop 10 times" thing and also makes some assumptions about how the sched ID variable is handled and when it gets set to -1.  I would rather either leave it this way or start writing a new set of helper macros that are less evil.


> On Sept. 19, 2011, 12:01 a.m., David Vossel wrote:
> > /branches/1.8/res/res_rtp_asterisk.c, lines 1052-1057
> > <https://reviewboard.asterisk.org/r/1444/diff/1/?file=20703#file20703line1052>
> >
> >     Why are we checking schedid == -1 here.   This implies that the thread removing the scheduled item is not the same as the thread executing the scheduled event.  If this is the case, how are we guaranteed schedid won't get set to -1 after this check occurs while still in the callback?   Would this cause a ref leak or anything?

Excellent observations.  You are correct that it is not guaranteed that the thread canceling the scheduled item is not the same one executing the scheduler.  Specifically, the SIP monitor thread is executing the scheduler and it is usually a channel thread canceling the scheduled item.  (sip_hangup -> stop_media_flows -> ast_rtp_instance_stop).

There will not be a ref leak.  The reference is only released when either the scheduled item is successfully canceled or if the scheduler callback is returning 0 indicating that it is not going to run again.

The other question, how are we guaranteed schedid won't get set to -1 after the check occurs, is more complicated.  The direct answer is we're not guaranteed.  The next question would be, is it harmful that there is no synchronization here?  I went back and forth with myself a bit on whether or not locking should be added here.  I obviously didn't want to add it unless it was absolutely necessary and ended up going without it.

If schedid gets set to -1 (because the RTP instance got stopped) while this function is running, it's just going to do its thing like usual and transmit one last RTCP packet.  That doesn't seem harmful.  It's just going to catch the -1 the next time it runs and then bail out and unref the RTP instance.  So, it all works out in the end.  It's a little less pretty, but at least no more locks get added.

As I wrote this out, the one down side I thought of is that if this edge case gets hit, it would delay the destruction of the RTP instance until the next RTCP packet is scheduled to be sent.  By default, this is every 5 seconds, but it's configurable.  I'm ok with that personally ...

Now, for the lulz, here is some code in the handling of the rtcpinterval configuraiton option in res_rtp_asterisk.c:

                        if (rtcpinterval == 0)
                                rtcpinterval = 0; /* Just so we're clear... it's zero */


- Russell


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/1444/#review4352
-----------------------------------------------------------


On Sept. 18, 2011, 9:20 p.m., Russell Bryant wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/1444/
> -----------------------------------------------------------
> 
> (Updated Sept. 18, 2011, 9:20 p.m.)
> 
> 
> Review request for Asterisk Developers.
> 
> 
> Summary
> -------
> 
> I opened ASTERISK-18570 for this issue.  The rest of the issues are ones that I have found so far that appear to be the same problem.
> 
> This patch addresses crashes related to RTCP handling.  The backtraces just show a crash in ast_rtcp_write() where it appears that the RTP instance is no longer valid.  There is a race condition with scheduled RTCP transmissions and the destruction of the RTP instance.  This patch utilizes the fact that ast_rtp_instance is a reference counted object and ensures that it will not get destroyed while a reference is still around due to scheduled RTCP transmissions.
> 
> RTCP transmissions are scheduled and executed from the chan_sip scheduler context.  This scheduler context is processed in the SIP monitor thread.  The destruction of an RTP instance occurs when the associated sip_pvt gets destroyed (which happens when the sip_pvt reference count reaches 0).  However, the SIP monitor thread is not the only thread that can cause a sip_pvt to get destroyed.  The sip_hangup function, executed from a channel thread, also decrements the reference count on a sip_pvt and could cause it to get destroyed.
> 
> While this is being changed anyway, the patch also removes calling ast_sched_del() from within the RTCP scheduler callback.  It's not helpful.  Simply returning 0 prevents the callback from being rescheduled.
> 
> 
> This addresses bugs ASTERISK-13334, ASTERISK-15257, ASTERISK-15406, ASTERISK-17560, ASTERISK-18570, ASTERISK-9716, and ASTERISK-9977.
>     https://issues.asterisk.org/jira/browse/ASTERISK-13334
>     https://issues.asterisk.org/jira/browse/ASTERISK-15257
>     https://issues.asterisk.org/jira/browse/ASTERISK-15406
>     https://issues.asterisk.org/jira/browse/ASTERISK-17560
>     https://issues.asterisk.org/jira/browse/ASTERISK-18570
>     https://issues.asterisk.org/jira/browse/ASTERISK-9716
>     https://issues.asterisk.org/jira/browse/ASTERISK-9977
> 
> 
> Diffs
> -----
> 
>   /branches/1.8/res/res_rtp_asterisk.c 335789 
> 
> Diff: https://reviewboard.asterisk.org/r/1444/diff
> 
> 
> Testing
> -------
> 
> This patch has been applied to a test environment with a couple of servers running Asterisk 1.8.7.0-rc1.  The servers have processed over 1 million calls without hitting this crash.
> 
> 
> Thanks,
> 
> Russell
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20110919/93b8d482/attachment.htm>


More information about the asterisk-dev mailing list