[asterisk-dev] [Code Review] 3927: Resolve race condition in scheduler when attempting to delete a running task
    rmudgett 
    reviewboard at asterisk.org
       
    Fri Aug 22 20:06:15 CDT 2014
    
    
  
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/3927/#review13156
-----------------------------------------------------------
/branches/12/main/sched.c
<https://reviewboard.asterisk.org/r/3927/#comment23619>
    Initing cond must be done only once so it should be done in sched_alloc().  sched_alloc() either creates a new sched object or gets it from a cache.
- rmudgett
On Aug. 22, 2014, 2:39 p.m., Mark Michelson wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/3927/
> -----------------------------------------------------------
> 
> (Updated Aug. 22, 2014, 2:39 p.m.)
> 
> 
> Review request for Asterisk Developers.
> 
> 
> Bugs: ASTERISK-24212
>     https://issues.asterisk.org/jira/browse/ASTERISK-24212
> 
> 
> Repository: Asterisk
> 
> 
> Description
> -------
> 
> Several tests in the testsuite had sporadic failures due to crashes that were occurring due to the scheduler. The crash goes something like this:
> 
> 1) Scheduler thread realizes it's time to send an RTCP packet.
> 2) Scheduler thread removes RTCP task from the heap so that it can be run.
> 3) A separate thread ends a call in progress, and attempts to delete the RTCP scheduler task using ast_sched_del().
> 4) ast_sched_del() cannot find the scheduled task since it is not in the heap (or hashtab in Asterisk 12). This results in a failed assertion.
> 5) Since the test agents are compiled with DO_CRASH, failing an assertion results in a crash.
> 6) A crash results in a failed test.
> 
> The solution I have crafted here is to maintain a pointer in the scheduler context to which task is currently executing. If we attempt to delete the running task, we wait for it to complete before continuing and return that we successfully deleted the scheduled task.
> 
> 
> Diffs
> -----
> 
>   /branches/12/main/sched.c 421883 
> 
> Diff: https://reviewboard.asterisk.org/r/3927/diff/
> 
> 
> Testing
> -------
> 
> The test channels/pjsip/basic_calls/two_parties/nominal/alice_initiated/alice_hangs_up was a test that, when I ran it in a loop, would have a test failure typically within about a half hour of starting the test loop. With this patch applied, I no longer see the crash described in the description.
> 
> HOWEVER, the test still does occasionally fail, but that's due to a separate race condition involving translation paths not being set up when attempting to perform talk detection. So while the patch attached here may not necessarily be enough to close the referenced issue, it is fixing one of the reasons for test failure.
> 
> 
> Thanks,
> 
> Mark Michelson
> 
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20140823/f614334c/attachment.html>
    
    
More information about the asterisk-dev
mailing list