[asterisk-dev] [Code Review] 2961: Add channel locking for calls that result in channel snapshot creation
Mark Michelson
reviewboard at asterisk.org
Thu Oct 31 10:56:26 CDT 2013
> On Oct. 25, 2013, 8:24 p.m., Kevin Harwell wrote:
> > More of a general question as I am curious. Why did you choose option 2 instead of 1? It seems to me that putting the locks closer to the point where they are actually needed is more desirable as it lessens locking time and also removes the onus (of lock/unlock) off of the caller.
> >
> > If a partial answer is that some of the time the function (create, set, etc...) does not need to be locked then could we create locking versions of those functions, e.g. create_snapshot && create_snapshot_locked?
There are some cases where the channel does not have to be locked during snapshot creation:
1. During channel creation, before the channel has been linked into the channels container.
2. When operating on a dummy channel.
There are a couple of reasons I prefer not to embed the locking in the API call
1. It can help to decrease the number of times that a channel has to be locked for a given set of operations. Consider
ast_channel_set_foo(chan, "bar");
ast_channel_set_eggs(chan, "bacon");
ast_channel_publish_snapshot(chan);
Locking the channel before these operations and unlocking after makes it so that you only have to lock the channel once, rather than, say, locking the channel on each set operation and in the ast_channel_publish_snapshot() operation. This also helps to make sets of operations atomic.
Of course, since Asterisk's mutexes are recursive, you could actually both lock around the three statements here and have the lock acquired during snapshot publication. I'd like to think that if at all possible we try to avoid (ab)using recursive mutexes though.
2. "Hidden" locking deep within function calls has the ability to lead to inadvertent lock inversion. Obviously, the functions I've pointed out in this review would have documentation updates to mention that they lock the channel at some point. However, such documentation could easily be missed when adding new similar functions.
To address your point about locking as close to where the need exists, this is actually what I'm attempting to do in this review. At its base, the channel snapshot creation requires a lock. All of the functions I listed essentially create a channel snapshot plus make some other changes to the channel structure. So really, the whole thing needs a lock. There are some ancillary operations in some of methods mentioned. For instance ast_channel_publish_snapshot() publishes a message to stasis in addition to creating a channel snapshot. The problem is that this particular function tends to be called within methods that set other channel fields, like ast_channel_language_set().
- Mark
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/2961/#review10003
-----------------------------------------------------------
On Oct. 25, 2013, 5:07 p.m., Mark Michelson wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/2961/
> -----------------------------------------------------------
>
> (Updated Oct. 25, 2013, 5:07 p.m.)
>
>
> Review request for Asterisk Developers.
>
>
> Bugs: ASTERISK-22709
> https://issues.asterisk.org/jira/browse/ASTERISK-22709
>
>
> Repository: Asterisk
>
>
> Description
> -------
>
> John Bigelow discovered that if the newly-created atxfer_threeway_nominal test is run in a loop, eventually a test run will cause the UUT instance of Asterisk to crash. Investigation of the crash showed that the crash was being caused while trying to get the string representation of a channel's translation paths while creating a channel snapshot. The attended transfer manager thread was creating the channel snapshot as part of the attended transfer stasis message while in another thread (in this case, the bridge channel thread) the translation paths were being altered to make the channel compatible with others in the bridge.
>
> It's clear that the issue is that the channel's contents are not lock protected while creating a channel snapshot. Luckily, the channel's lock is held already when updating translation paths. By adding locking, I was able to run the test over 450 times straight with no negative consequences.
>
> When adding locking, there were two strategies to follow:
>
> 1) Take advantage of the fact that channel locks are recursive, and just add a single SCOPED_CHANNELLOCK() call to ast_channel_snapshot_create() and call it a day.
> 2) Make an audit of channel-related calls that end up creating a channel snapshot and require locks be held prior to making those calls.
>
> I like approach 2 more, even though it's harder to pull off. In investigating what paths result in channel snapshots being created, I found that channel locks should be required before calling the following functions:
>
> * ast_channel_snapshot_create()
> * ast_channel_blob_create()
> * ast_channel_publish_snapshot()
> * ast_publish_channel_state()
> * ast_channel_publish_blob()
> * ast_channel_publish_varset()
> * ast_channel_amaflags_set()
> * ast_channel_callid_set()
> * ast_channel_whentohangup_set()
> * ast_channel_callgroup_set()
> * ast_channel_pickupgroup_set()
> * ast_channel_internal_bridge_set()
> * ast_channel_language_set()
> * ast_channel_accountcode_set()
> * ast_channel_peeraccount_set()
> * ast_channel_linkedid_set()
> * ast_channel_stage_snapshot()
> * ast_setstate()
> * ast_aoc_manager_event()
> * ast_channel_setwhentohangup_tv()
> * ast_channel_setwhentohangup()
> * ast_set_variables()
>
> All of these functions' documentation have been updated to indicate a precondition of having the channel locked.
>
> In addition, there are some more complex functions that create channel snapshots but could not easily be made to have the precondition of having the channel locked. These are:
>
> * ast_bridge_blob_create()
> * ast_publish_attended_transfer_fail()
> * ast_publish_attended_transfer_bridge_merge()
> * ast_publish_attended_transfer_threeway()
> * ast_publish_attended_transfer_app()
> * ast_publish_attended_transfer_link()
> * ast_bridge_publish_enter()
> * ast_bridge_publish_leave()
> * ast_bridge_publish_blind_transfer()
>
> These functions' documentation now have a precondition of having no channels or bridges locked.
>
> This review seeks only to make sure that channels are properly locked prior to creating channel snapshots. A similar effort is likely needed to ensure that bridges are locked during bridge snapshot creation.
>
>
> Diffs
> -----
>
> /branches/12/tests/test_stasis_channels.c 401703
> /branches/12/tests/test_cel.c 401703
> /branches/12/tests/test_cdr.c 401703
> /branches/12/res/res_stasis.c 401703
> /branches/12/res/res_pjsip_refer.c 401703
> /branches/12/res/res_agi.c 401703
> /branches/12/res/parking/parking_manager.c 401703
> /branches/12/res/parking/parking_bridge_features.c 401703
> /branches/12/res/ari/resource_bridges.c 401703
> /branches/12/pbx/pbx_realtime.c 401703
> /branches/12/main/stasis_channels.c 401703
> /branches/12/main/stasis_bridges.c 401703
> /branches/12/main/pickup.c 401703
> /branches/12/main/pbx.c 401703
> /branches/12/main/endpoints.c 401703
> /branches/12/main/dial.c 401703
> /branches/12/main/core_unreal.c 401703
> /branches/12/main/core_local.c 401703
> /branches/12/main/channel.c 401703
> /branches/12/main/cel.c 401703
> /branches/12/main/bridge_channel.c 401703
> /branches/12/main/bridge.c 401703
> /branches/12/include/asterisk/stasis_channels.h 401703
> /branches/12/include/asterisk/stasis_bridges.h 401703
> /branches/12/include/asterisk/channelstate.h 401703
> /branches/12/include/asterisk/channel.h 401703
> /branches/12/include/asterisk/aoc.h 401703
> /branches/12/funcs/func_timeout.c 401703
> /branches/12/channels/sig_pri.c 401703
> /branches/12/channels/sig_analog.c 401703
> /branches/12/channels/chan_vpb.cc 401703
> /branches/12/channels/chan_unistim.c 401703
> /branches/12/channels/chan_skinny.c 401703
> /branches/12/channels/chan_sip.c 401703
> /branches/12/channels/chan_pjsip.c 401703
> /branches/12/channels/chan_phone.c 401703
> /branches/12/channels/chan_oss.c 401703
> /branches/12/channels/chan_nbs.c 401703
> /branches/12/channels/chan_motif.c 401703
> /branches/12/channels/chan_misdn.c 401703
> /branches/12/channels/chan_mgcp.c 401703
> /branches/12/channels/chan_jingle.c 401703
> /branches/12/channels/chan_iax2.c 401703
> /branches/12/channels/chan_h323.c 401703
> /branches/12/channels/chan_gtalk.c 401703
> /branches/12/channels/chan_dahdi.c 401703
> /branches/12/channels/chan_console.c 401703
> /branches/12/channels/chan_alsa.c 401703
> /branches/12/apps/app_voicemail.c 401703
> /branches/12/apps/app_userevent.c 401703
> /branches/12/apps/app_queue.c 401703
> /branches/12/apps/app_meetme.c 401703
> /branches/12/apps/app_disa.c 401703
> /branches/12/apps/app_dial.c 401703
> /branches/12/apps/app_confbridge.c 401703
> /branches/12/apps/app_agent_pool.c 401703
> /branches/12/addons/chan_ooh323.c 401703
> /branches/12/addons/chan_mobile.c 401703
>
> Diff: https://reviewboard.asterisk.org/r/2961/diff/
>
>
> Testing
> -------
>
> As stated in the description, I ran the atxfer_threeway_nominal test in a loop. After approximately 450 test runs, there were no negative consequences. Prior to this changeset, running the test 20-50 times would result in a crash.
>
>
> Thanks,
>
> Mark Michelson
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20131031/d9978e17/attachment-0001.html>
More information about the asterisk-dev
mailing list