[asterisk-dev] [Code Review] 3954: pjsip_options.c: Fix race condition stopping periodic out of dialog OPTIONS request.

Mark Michelson reviewboard at asterisk.org
Thu Sep 18 17:09:34 CDT 2014



> On Sept. 11, 2014, 3:16 p.m., Joshua Colp wrote:
> > /branches/13/res/res_pjsip.c, lines 2488-2501
> > <https://reviewboard.asterisk.org/r/3954/diff/4/?file=67305#file67305line2488>
> >
> >     Looking at the pjproject code why wouldn't the following return values cover these cases for when the callback is not called:
> >     
> >     PJ_EINVAL
> >     PJ_EINVALIDOP
> >     PJ_ENOMEM
> >     
> >     Handling those as special would reduce the memory leak possibility.
> 
> rmudgett wrote:
>     Almost any value could be returned because user supplied callbacks could fail with any error code they think is reasonable.  Most of those transport callback routines can return PJ_EINVAL when they sanity check the parameters.  If it was going out over a websocket transport the PJ_ENOMEM value could be returned.  The safeset thing that can be done is to not free anything.

I'm going to suggest something that I think should ensure that memory gets freed. The issue is that the callback may or may not be called when an error occurs. Similarly, the callback may be called synchronously or asynchronously.

By process of elimination, if the callback is called asynchronously, then that means that the initial call to pjsip_endpt_send_request() MUST have returned success. There's no way for pjsip_endpt_send_request() to return an error if the error happens at a separate time. Any in the async case, the callback SHOULD be called every single time.

The problem can be narrowed down to the case of when the callback is called synchronously when an error occurs. Since the callback is called synchronously, it can be made possible to detect if the callback has been called when pjsip_endpt_send_request() returns. There's only one piece of data that is common to both send_out_of_dialog_request() and send_request_cb(), and that's the token. In send_out_of_dialog_request(), you can wrap token inside another structure that has a flag that indicates if the callback has been called or not.

With all this information at hand, you do the following:
* In send_request_cb, always free the data pased in, and be sure to mark the token's wrapper so that it indicates that the callback was called.
* In send_out_of_dialog_request(), if pjsip_endpt_send_request() returns an error, then check the token wrapper to see if the callback was called. If so, do nothing. If not, then free the data. In both cases, you can return an error, and the caller of send_out_of_dialog_request() can safely free their token.


- Mark


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/3954/#review13283
-----------------------------------------------------------


On Sept. 10, 2014, 10:18 p.m., rmudgett wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/3954/
> -----------------------------------------------------------
> 
> (Updated Sept. 10, 2014, 10:18 p.m.)
> 
> 
> Review request for Asterisk Developers.
> 
> 
> Bugs: AFS-155 and ASTERISK-24295
>     https://issues.asterisk.org/jira/browse/AFS-155
>     https://issues.asterisk.org/jira/browse/ASTERISK-24295
> 
> 
> Repository: Asterisk
> 
> 
> Description
> -------
> 
> The crash on the issues is a result of an invalid transport configuration change when asterisk is restarted.  The attempt to send the qualify request fails and we cleaned up.  However, the callback is also called which results in a double unref of the objects involved.
> 
> * Fixed send_request_cb() and qualify_contact_cb() to not cleanup the token when the PJSIP event is PJSIP_EVENT_TRANSPORT_ERROR since the initial function call will do the clean up.
> 
> * Made send_request_cb() able to handle repeated challenges (Up to 10).
> 
> * Fix periodic endpoint qualify OPTIONS sched deletion race by avoiding it.  The sched entry will no longer self stop and must be externally stopped.
> 
> * Added REF_DEBUG description tags to struct sched_data in pjsip_options.c.
> 
> * Fix some off-nominal ref leaks in schedule_qualify(), qualify_and_schedule().
> 
> * Reordered pjsip_options.c module start/stop code to cleanup better on error.
> 
> 
> Diffs
> -----
> 
>   /branches/13/res/res_pjsip/pjsip_options.c 422963 
>   /branches/13/res/res_pjsip.c 422963 
> 
> Diff: https://reviewboard.asterisk.org/r/3954/diff/
> 
> 
> Testing
> -------
> 
> * With the qualify_frequency option enabled, added and removed a "local_net=" line in the transport section and restarted asterisk via "core restart now".  Before the latest patch version, asterisk would crash.  With the new patch, it keeps on going.
> 
> * Set the qualify_frequency option to different values and reloaded res_pjsip each time.  The OPTIONS poll frequency changed, started, and stopped according to the new qualify_frequency value.
> 
> 
> Thanks,
> 
> rmudgett
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20140918/e111255a/attachment-0001.html>


More information about the asterisk-dev mailing list