[asterisk-dev] [Code Review]: Fix crashes in ast_rtcp_write()

Tilghman Lesher reviewboard at asterisk.org
Mon Sep 19 11:21:49 CDT 2011



> On Sept. 19, 2011, 12:01 a.m., David Vossel wrote:
> > /branches/1.8/res/res_rtp_asterisk.c, lines 2621-2626
> > <https://reviewboard.asterisk.org/r/1444/diff/1/?file=20703#file20703line2621>
> >
> >     There are macros for handling the ref counting of stuff associated with a scheduled event.
> 
> Russell Bryant wrote:
>     Yeah ... I avoided it because AST_SCHED_DEL_UNREF() is even more evil than AST_SCHED_DEL().  It does the "loop 10 times" thing and also makes some assumptions about how the sched ID variable is handled and when it gets set to -1.  I would rather either leave it this way or start writing a new set of helper macros that are less evil.

Best way to make it less evil is to change the assumption in the scheduler code that when a callback is running, it's removed from the list.  It's a basic assumption that has led to the number of scheduler macros that work on a principle of "scheduler jobs shouldn't take more than 1 second to run, so sometime in this sequence, it should eventually cancel the task".  It's probably past time to break that assumption in the scheduler
code and get rid of the race condition between a running task that always reschedules itself and another thread trying to cancel that task.


- Tilghman


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/1444/#review4352
-----------------------------------------------------------


On Sept. 18, 2011, 9:20 p.m., Russell Bryant wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/1444/
> -----------------------------------------------------------
> 
> (Updated Sept. 18, 2011, 9:20 p.m.)
> 
> 
> Review request for Asterisk Developers.
> 
> 
> Summary
> -------
> 
> I opened ASTERISK-18570 for this issue.  The rest of the issues are ones that I have found so far that appear to be the same problem.
> 
> This patch addresses crashes related to RTCP handling.  The backtraces just show a crash in ast_rtcp_write() where it appears that the RTP instance is no longer valid.  There is a race condition with scheduled RTCP transmissions and the destruction of the RTP instance.  This patch utilizes the fact that ast_rtp_instance is a reference counted object and ensures that it will not get destroyed while a reference is still around due to scheduled RTCP transmissions.
> 
> RTCP transmissions are scheduled and executed from the chan_sip scheduler context.  This scheduler context is processed in the SIP monitor thread.  The destruction of an RTP instance occurs when the associated sip_pvt gets destroyed (which happens when the sip_pvt reference count reaches 0).  However, the SIP monitor thread is not the only thread that can cause a sip_pvt to get destroyed.  The sip_hangup function, executed from a channel thread, also decrements the reference count on a sip_pvt and could cause it to get destroyed.
> 
> While this is being changed anyway, the patch also removes calling ast_sched_del() from within the RTCP scheduler callback.  It's not helpful.  Simply returning 0 prevents the callback from being rescheduled.
> 
> 
> This addresses bugs ASTERISK-13334, ASTERISK-15257, ASTERISK-15406, ASTERISK-17560, ASTERISK-18570, ASTERISK-9716, and ASTERISK-9977.
>     https://issues.asterisk.org/jira/browse/ASTERISK-13334
>     https://issues.asterisk.org/jira/browse/ASTERISK-15257
>     https://issues.asterisk.org/jira/browse/ASTERISK-15406
>     https://issues.asterisk.org/jira/browse/ASTERISK-17560
>     https://issues.asterisk.org/jira/browse/ASTERISK-18570
>     https://issues.asterisk.org/jira/browse/ASTERISK-9716
>     https://issues.asterisk.org/jira/browse/ASTERISK-9977
> 
> 
> Diffs
> -----
> 
>   /branches/1.8/res/res_rtp_asterisk.c 335789 
> 
> Diff: https://reviewboard.asterisk.org/r/1444/diff
> 
> 
> Testing
> -------
> 
> This patch has been applied to a test environment with a couple of servers running Asterisk 1.8.7.0-rc1.  The servers have processed over 1 million calls without hitting this crash.
> 
> 
> Thanks,
> 
> Russell
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20110919/d0308e3f/attachment.htm>


More information about the asterisk-dev mailing list