[asterisk-dev] [Code Review] 3954: pjsip_options.c: Fix race condition stopping periodic out of dialog OPTIONS request.

rmudgett reviewboard at asterisk.org
Thu Sep 4 17:43:08 CDT 2014



> On Sept. 3, 2014, 6:04 p.m., Mark Michelson wrote:
> > /branches/13/res/res_pjsip/pjsip_options.c, lines 401-404
> > <https://reviewboard.asterisk.org/r/3954/diff/2/?file=67194#file67194line401>
> >
> >     I don't understand the race condition you're referring to here since the scheduler has been altered to deal with an external deletion of the currently running task.
> >     
> >     Also, switching from a variable scheduler callback means that if the qualify frequency of a contact is changed, we will not switch to the new qualify frequency.
> 
> rmudgett wrote:
>     The frequency changes because the old scheduled event is always deleted before being restarted with the new time.
>     
>     As for the race condition, I was unable to see in the code self deletion and external deletion being safe.

There is a potential race cleaning up the data object associated with the scheduled entry callback.  When a scheduled entry self deletes, it needs to set data->id = -1 to block future external deletions from cleaning up the data object and getting an assertion failure for an invalid id.  Unfortunately, there is still a window where ast_sched_del() could be waiting on the sched_context lock with the old id while ast_sched_runq() holds the lock to remove the entry because of self deletion.


- rmudgett


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/3954/#review13226
-----------------------------------------------------------


On Sept. 3, 2014, 9:40 a.m., rmudgett wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/3954/
> -----------------------------------------------------------
> 
> (Updated Sept. 3, 2014, 9:40 a.m.)
> 
> 
> Review request for Asterisk Developers.
> 
> 
> Bugs: AFS-155 and ASTERISK-24295
>     https://issues.asterisk.org/jira/browse/AFS-155
>     https://issues.asterisk.org/jira/browse/ASTERISK-24295
> 
> 
> Repository: Asterisk
> 
> 
> Description
> -------
> 
> The crash on the issues is a result of an invalid transport configuration change when asterisk is restarted.  The attempt to send the qualify request fails and we cleaned up.  However, the callback is also called which results in a double unref of the objects involved.
> 
> * Fixed send_out_of_dialog_request() to not return error or cleanup resources if pjsip_endpt_send_request() is not successful.
> 
> * Fix periodic endpoint qualify OPTIONS sched deletion race by avoiding it.  The sched entry will no longer self stop and must be externally stopped.
> 
> * Added REF_DEBUG description tags to struct sched_data in pjsip_options.c.
> 
> * Fix some off-nominal ref leaks in schedule_qualify(), qualify_and_schedule().
> 
> * Reordered pjsip_options.c module start/stop code to cleanup better on error.
> 
> 
> Diffs
> -----
> 
>   /branches/13/res/res_pjsip/pjsip_options.c 422562 
>   /branches/13/res/res_pjsip.c 422562 
> 
> Diff: https://reviewboard.asterisk.org/r/3954/diff/
> 
> 
> Testing
> -------
> 
> * With the qualify_frequency option enabled, added and removed a "local_net=" line in the transport section and restarted asterisk via "core restart now".  Before the latest patch version, asterisk would crash.  With the new patch, it keeps on going.
> 
> * Set the qualify_frequency option to different values and reloaded res_pjsip each time.  The OPTIONS poll frequency changed, started, and stopped according to the new qualify_frequency value.
> 
> 
> Thanks,
> 
> rmudgett
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20140904/4bed2568/attachment-0001.html>


More information about the asterisk-dev mailing list