[asterisk-dev] [Code Review] 4381: res_pjsip_outbound_registration: Fix reload race condition.

rmudgett reviewboard at asterisk.org
Tue Jan 27 21:43:27 CST 2015



> On Jan. 27, 2015, 7:53 p.m., Matt Jordan wrote:
> > Given this problem description:
> > 
> > {quote}
> > Performing a CLI "module reload" command when there are new pjsip.conf registration objects defined frequently failed to load them correctly.  What happens is a race condtion between res_pjsip pushing its reload into an asynchronous task processor task and the thread that does the rest of the reloads when it gets to reloading the res_pjsip_outbound_registration module.
> > {quote}
> > 
> > I'm not sure why any change other than this one is necessary:
> > 
> > {quote}
> > * Made res_pjsip.c reload_module() use ast_sip_push_task_synchronous() instead of ast_sip_push_task() to eliminate two threads processing config reloads at the same time.
> > {quote}
> > 
> > That doesn't mean that these changes aren't correct or necessary, just that the issue description doesn't really explain everything that was modified in this patch. I'll infer what I can from the change below, but some clarification here on what all is being fixed would be appreciated.
> > 
> > {quote}
> > * Made get_registrations() not replace the global current_states container.  You could never add/remove objects in the container without the possibility of the container being replaced out from under you by get_registrations().
> > {quote}
> > 
> > I'm assuming this was changed due to the CLI/AMI commands altering the state of the outbound registrations while they are in the process of displaying them. I definitely agree with that change, but it isn't clear from the issue description that this change is occurring for that reason. (In fact, really, it's a completely separate issue from what is being solved here.)
> > 
> > {quote}
> > * Added a registration loaded sorcery instance observer to purge any dead registration objects.  The sorcery instance observer (struct ast_sorcery_instance_observer) must be used because the callback happens immediately during the load process.  The external observers callbacks (struct ast_sorcery_observer) are pushed to a different thread.
> > {quote}
> > 
> > This I don't understand, although I freely admit that's because my sorcery-foo is weak. You'll need to explain why this change was needed.
> > 
> > {quote}
> > * Added some global current_states NULL pointer checks in case the container dissapears because of unload_module().
> > {quote}
> > 
> > Yup, that makes sense and is a nice cleanup in addition to what was done in this patch.
> > 
> > {quote}
> > * Made sorcery's instance loaded observer callback (struct ast_sorcery_instance_observer) guaranteed to be called before any external observers (struct ast_sorcery_observer) will be called.
> > {quote}
> > 
> > This worries me a lot. Making changes to a core API that a lot of modules depend on feels like it is way out of scope for this issue. Why is this necessary?
> > 
> > {quote}
> > * Moved the check for non-reloadable objects to before the sorcery instance loading callbacks happen to short circuit the attempt to reload non-reloadable types earlier and so the non-reloadable type message can only happens once for each non-reloadable type.  Previously the sorcery instance loading/loaded callbacks would always happen, the individual wizard loading would be prevented, and the non-reloadable type message would happen for each associated wizard.
> > {quote}
> > 
> > This feels confusing enough that it worries me in the same fashion as moving the callbacks. Again, this feels out of scope for this issue.

* The get_registrations() change is due to the CLI/AMI commands altering the state of the registrations state container while they are in the process of displaying them.  That interferes with loading the registration type objects.

* The added sorcery instance observer does the dead registrations purge after the config load that the get_registrations() function can no longer perform.

The sorcery.c changes could be kicked out of this patch.  They only do two things: 1) Short circuit unnecessary work for the non-reloadable types.  2) Ensure that the struct ast_sorcery_instance_observer.object_type_loaded callbacks happen in a predictable order.  The instance observer I added only works with an internal container to the outgoing registrations so it isn't critical for that to be complete before the type observers run.

I'll rework the descriptions.


- rmudgett


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/4381/#review14327
-----------------------------------------------------------


On Jan. 27, 2015, 6:46 p.m., rmudgett wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/4381/
> -----------------------------------------------------------
> 
> (Updated Jan. 27, 2015, 6:46 p.m.)
> 
> 
> Review request for Asterisk Developers.
> 
> 
> Bugs: ASTERISK-24729
>     https://issues.asterisk.org/jira/browse/ASTERISK-24729
> 
> 
> Repository: Asterisk
> 
> 
> Description
> -------
> 
> Performing a CLI "module reload" command when there are new pjsip.conf registration objects defined frequently failed to load them correctly.  What happens is a race condtion between res_pjsip pushing its reload into an asynchronous task processor task and the thread that does the rest of the reloads when it gets to reloading the res_pjsip_outbound_registration module.
> 
> * Made res_pjsip.c reload_module() use ast_sip_push_task_synchronous() instead of ast_sip_push_task() to eliminate two threads processing config reloads at the same time.
> 
> * Made get_registrations() not replace the global current_states container.  You could never add/remove objects in the container without the possibility of the container being replaced out from under you by get_registrations().
> 
> * Added a registration loaded sorcery instance observer to purge any dead registration objects.  The sorcery instance observer (struct ast_sorcery_instance_observer) must be used because the callback happens immediately during the load process.  The external observers callbacks (struct ast_sorcery_observer) are pushed to a different thread.
> 
> * Added some global current_states NULL pointer checks in case the container dissapears because of unload_module().
> 
> * Made sorcery's instance loaded observer callback (struct ast_sorcery_instance_observer) guaranteed to be called before any external observers (struct ast_sorcery_observer) will be called.
> 
> * Moved the check for non-reloadable objects to before the sorcery instance loading callbacks happen to short circuit the attempt to reload non-reloadable types earlier and so the non-reloadable type message can only happens once for each non-reloadable type.  Previously the sorcery instance loading/loaded callbacks would always happen, the individual wizard loading would be prevented, and the non-reloadable type message would happen for each associated wizard.
> 
> 
> Diffs
> -----
> 
>   /branches/13/res/res_pjsip_outbound_registration.c 431239 
>   /branches/13/res/res_pjsip.c 431239 
>   /branches/13/main/sorcery.c 431239 
> 
> Diff: https://reviewboard.asterisk.org/r/4381/diff/
> 
> 
> Testing
> -------
> 
> Manually reloaded pjsip.conf with and without the registration type object define.  Without the patch, CLI "pjsip show registrations" frequently showed an empty list when there should have been an object displayed.  With the patch it no longer fails to load.
> 
> The test_config, test_sorcery, and test_sorcery_astdb unit tests still pass.
> 
> 
> Thanks,
> 
> rmudgett
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20150128/e8332b23/attachment.html>


More information about the asterisk-dev mailing list