[asterisk-dev] [Code Review] 4381: res_pjsip_outbound_registration: Fix reload race condition.

rmudgett reviewboard at asterisk.org
Tue Jan 27 22:10:08 CST 2015


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/4381/
-----------------------------------------------------------

(Updated Jan. 27, 2015, 10:10 p.m.)


Status
------

This change has been marked as submitted.


Review request for Asterisk Developers.


Changes
-------

Committed in revision 431243


Bugs: ASTERISK-24729
    https://issues.asterisk.org/jira/browse/ASTERISK-24729


Repository: Asterisk


Description
-------

Performing a CLI "module reload" command when there are new pjsip.conf
registration objects defined frequently failed to load them correctly.

What happens is a race condition between res_pjsip pushing its reload into
an asynchronous task processor task and the thread that does the rest of
the reloads when it gets to reloading the res_pjsip_outbound_registration
module.  A similar race condition happens between a reload and the CLI/AMI
show registrations commands.  The reload updates the current_states
container and the CLI/AMI commands call get_registrations() which builds a
new current_states container.

* Made res_pjsip.c reload_module() use ast_sip_push_task_synchronous()
instead of ast_sip_push_task() to eliminate two threads processing config
reloads at the same time.

* Made get_registrations() not replace the global current_states container
so the CLI/AMI show registrations command cannot interfere with reloading.
You could never add/remove objects in the container without the
possibility of the container being replaced out from under you by
get_registrations().

* Added a registration loaded sorcery instance observer to purge any dead
registration objects since get_registrations() cannot do this job anymore.
The struct ast_sorcery_instance_observer callbacks must be used because
the callback happens inline with the load process.  The struct
ast_sorcery_observer callbacks are pushed to a different thread.

* Added some global current_states NULL pointer checks in case the
container disappears because of unload_module().

* Made sorcery's struct ast_sorcery_instance_observer.object_type_loaded
callbacks guaranteed to be called before any struct
ast_sorcery_observer.loaded callbacks will be called.

* Moved the check for non-reloadable objects to before the sorcery
instance loading callbacks happen to short circuit unnecessary work.
Previously with non-reloadable objects, the sorcery instance
loading/loaded callbacks would always happen, the individual wizard
loading/loaded would be prevented, and the non-reloadable type logging
message would be logged for each associated wizard.


Diffs
-----

  /branches/13/res/res_pjsip_outbound_registration.c 431239 
  /branches/13/res/res_pjsip.c 431239 
  /branches/13/main/sorcery.c 431239 

Diff: https://reviewboard.asterisk.org/r/4381/diff/


Testing
-------

Manually reloaded pjsip.conf with and without the registration type object define.  Without the patch, CLI "pjsip show registrations" frequently showed an empty list when there should have been an object displayed.  With the patch it no longer fails to load.

The test_config, test_sorcery, and test_sorcery_astdb unit tests still pass.


Thanks,

rmudgett

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20150128/718b8f27/attachment.html>


More information about the asterisk-dev mailing list