[asterisk-dev] [Code Review] 2961: Add channel locking for calls that result in channel snapshot creation

rmudgett reviewboard at asterisk.org
Mon Nov 18 21:02:13 CST 2013


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/2961/#review10213
-----------------------------------------------------------



/branches/12/addons/chan_ooh323.c
<https://reviewboard.asterisk.org/r/2961/#comment19539>

    blob
    
    This may not be the line you actually moved, but reviewboard thinks it is.



/branches/12/apps/app_dial.c
<https://reviewboard.asterisk.org/r/2961/#comment19540>

    Calling ast_channel_get_device_name() with a channel locked has a potential for deadlock.  Just lock tc around ast_channel_stage_snapshot() and move the lock both back to the original position.



/branches/12/channels/chan_pjsip.c
<https://reviewboard.asterisk.org/r/2961/#comment19541>

    Is it safe to do these finds with the channel locked?



/branches/12/channels/chan_pjsip.c
<https://reviewboard.asterisk.org/r/2961/#comment19542>

    I think ast_endpoint_add_channel() should to be moved to after the channel unlock to avoid locking order problems.  It does an unconditional channel snapshot itself.



/branches/12/channels/sig_pri.c
<https://reviewboard.asterisk.org/r/2961/#comment19543>

    Revert this.
    
    The only caller of this function is sig_pri_mcid_event().  Chan can legitimately be NULL.  If it is not NULL then it is already locked.



/branches/12/main/bridge.c
<https://reviewboard.asterisk.org/r/2961/#comment19545>

    Do trylock on other->chan.
    
    Return 1 here.
    
    Returning -1 indicates that the optimization was completed.



/branches/12/main/bridge.c
<https://reviewboard.asterisk.org/r/2961/#comment19544>

    blob



/branches/12/main/core_local.c
<https://reviewboard.asterisk.org/r/2961/#comment19547>

    Revert.  The base.owner and base.chan are already locked.



/branches/12/main/core_local.c
<https://reviewboard.asterisk.org/r/2961/#comment19546>

    Revert.  Source is already locked.



/branches/12/main/core_local.c
<https://reviewboard.asterisk.org/r/2961/#comment19548>

    You did not fix this.  You have a lock inversion here.
    
    You need to use ast_unreal_lock_all() and get rid of that SCOPED_LOCK.



/branches/12/res/res_agi.c
<https://reviewboard.asterisk.org/r/2961/#comment19549>

    Need to lock chan around this call.
    
    Check if these other places need locking added:
    aoc.c:1874
    chan_sip.c:28824
    channel.c:3629
    channel.c:3646
    pbx.c:5793
    stasis_channels.c:667


- rmudgett


On Nov. 13, 2013, 6:59 p.m., Mark Michelson wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/2961/
> -----------------------------------------------------------
> 
> (Updated Nov. 13, 2013, 6:59 p.m.)
> 
> 
> Review request for Asterisk Developers.
> 
> 
> Bugs: ASTERISK-22709
>     https://issues.asterisk.org/jira/browse/ASTERISK-22709
> 
> 
> Repository: Asterisk
> 
> 
> Description
> -------
> 
> John Bigelow discovered that if the newly-created atxfer_threeway_nominal test is run in a loop, eventually a test run will cause the UUT instance of Asterisk to crash. Investigation of the crash showed that the crash was being caused while trying to get the string representation of a channel's translation paths while creating a channel snapshot. The attended transfer manager thread was creating the channel snapshot as part of the attended transfer stasis message while in another thread (in this case, the bridge channel thread) the translation paths were being altered to make the channel compatible with others in the bridge.
> 
> It's clear that the issue is that the channel's contents are not lock protected while creating a channel snapshot. Luckily, the channel's lock is held already when updating translation paths. By adding locking, I was able to run the test over 450 times straight with no negative consequences.
> 
> When adding locking, there were two strategies to follow:
> 
> 1) Take advantage of the fact that channel locks are recursive, and just add a single SCOPED_CHANNELLOCK() call to ast_channel_snapshot_create() and call it a day.
> 2) Make an audit of channel-related calls that end up creating a channel snapshot and require locks be held prior to making those calls.
> 
> I like approach 2 more, even though it's harder to pull off. In investigating what paths result in channel snapshots being created, I found that channel locks should be required before calling the following functions:
> 
> * ast_channel_snapshot_create()
> * ast_channel_blob_create()
> * ast_channel_publish_snapshot()
> * ast_publish_channel_state()
> * ast_channel_publish_blob()
> * ast_channel_publish_varset()
> * ast_channel_amaflags_set()
> * ast_channel_callid_set()
> * ast_channel_whentohangup_set()
> * ast_channel_callgroup_set()
> * ast_channel_pickupgroup_set()
> * ast_channel_internal_bridge_set()
> * ast_channel_language_set()
> * ast_channel_accountcode_set()
> * ast_channel_peeraccount_set()
> * ast_channel_linkedid_set()
> * ast_channel_stage_snapshot()
> * ast_setstate()
> * ast_aoc_manager_event()
> * ast_channel_setwhentohangup_tv()
> * ast_channel_setwhentohangup()
> * ast_set_variables()
> 
> All of these functions' documentation have been updated to indicate a precondition of having the channel locked.
> 
> In addition, there are some more complex functions that create channel snapshots but could not easily be made to have the precondition of having the channel locked. These are:
> 
> * ast_bridge_blob_create()
> * ast_publish_attended_transfer_fail()
> * ast_publish_attended_transfer_bridge_merge()
> * ast_publish_attended_transfer_threeway()
> * ast_publish_attended_transfer_app()
> * ast_publish_attended_transfer_link()
> * ast_bridge_publish_enter()
> * ast_bridge_publish_leave()
> * ast_bridge_publish_blind_transfer()
> 
> These functions' documentation now have a precondition of having no channels or bridges locked.
> 
> This review seeks only to make sure that channels are properly locked prior to creating channel snapshots. A similar effort is likely needed to ensure that bridges are locked during bridge snapshot creation.
> 
> 
> Diffs
> -----
> 
>   /branches/12/tests/test_stasis_channels.c 402686 
>   /branches/12/tests/test_cel.c 402686 
>   /branches/12/tests/test_cdr.c 402686 
>   /branches/12/res/res_stasis.c 402686 
>   /branches/12/res/res_pjsip_refer.c 402686 
>   /branches/12/res/res_agi.c 402686 
>   /branches/12/res/parking/parking_manager.c 402686 
>   /branches/12/res/parking/parking_bridge_features.c 402686 
>   /branches/12/res/ari/resource_bridges.c 402686 
>   /branches/12/pbx/pbx_realtime.c 402686 
>   /branches/12/main/stasis_channels.c 402686 
>   /branches/12/main/stasis_bridges.c 402686 
>   /branches/12/main/pickup.c 402686 
>   /branches/12/main/pbx.c 402686 
>   /branches/12/main/endpoints.c 402686 
>   /branches/12/main/dial.c 402686 
>   /branches/12/main/core_unreal.c 402686 
>   /branches/12/main/core_local.c 402686 
>   /branches/12/main/channel.c 402769 
>   /branches/12/main/cel.c 402686 
>   /branches/12/main/bridge_channel.c 402686 
>   /branches/12/main/bridge.c 402686 
>   /branches/12/include/asterisk/stasis_channels.h 402686 
>   /branches/12/include/asterisk/stasis_bridges.h 402686 
>   /branches/12/include/asterisk/channelstate.h 402686 
>   /branches/12/include/asterisk/channel.h 402686 
>   /branches/12/include/asterisk/aoc.h 402686 
>   /branches/12/funcs/func_timeout.c 402686 
>   /branches/12/channels/sig_pri.c 402686 
>   /branches/12/channels/sig_analog.c 402686 
>   /branches/12/channels/chan_vpb.cc 402686 
>   /branches/12/channels/chan_unistim.c 402686 
>   /branches/12/channels/chan_skinny.c 402686 
>   /branches/12/channels/chan_sip.c 402686 
>   /branches/12/channels/chan_pjsip.c 402686 
>   /branches/12/channels/chan_phone.c 402686 
>   /branches/12/channels/chan_oss.c 402686 
>   /branches/12/channels/chan_nbs.c 402686 
>   /branches/12/channels/chan_motif.c 402686 
>   /branches/12/channels/chan_misdn.c 402686 
>   /branches/12/channels/chan_mgcp.c 402686 
>   /branches/12/channels/chan_jingle.c 402686 
>   /branches/12/channels/chan_iax2.c 402686 
>   /branches/12/channels/chan_h323.c 402686 
>   /branches/12/channels/chan_gtalk.c 402686 
>   /branches/12/channels/chan_dahdi.c 402686 
>   /branches/12/channels/chan_console.c 402686 
>   /branches/12/channels/chan_alsa.c 402686 
>   /branches/12/apps/app_voicemail.c 402686 
>   /branches/12/apps/app_userevent.c 402686 
>   /branches/12/apps/app_queue.c 402686 
>   /branches/12/apps/app_meetme.c 402686 
>   /branches/12/apps/app_disa.c 402686 
>   /branches/12/apps/app_dial.c 402686 
>   /branches/12/apps/app_confbridge.c 402687 
>   /branches/12/apps/app_agent_pool.c 402686 
>   /branches/12/addons/chan_ooh323.c 402686 
>   /branches/12/addons/chan_mobile.c 402686 
> 
> Diff: https://reviewboard.asterisk.org/r/2961/diff/
> 
> 
> Testing
> -------
> 
> As stated in the description, I ran the atxfer_threeway_nominal test in a loop. After approximately 450 test runs, there were no negative consequences. Prior to this changeset, running the test 20-50 times would result in a crash.
> 
> 
> Thanks,
> 
> Mark Michelson
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20131119/14c8d0ff/attachment-0001.html>


More information about the asterisk-dev mailing list