[Asterisk-code-review] Work around thread safety issues in the PBX core. (asterisk[11])

Matt Jordan asteriskteam at digium.com
Wed May 20 16:36:14 CDT 2015


Matt Jordan has posted comments on this change.

Change subject: Work around thread safety issues in the PBX core.
......................................................................


Patch Set 1: Code-Review-1

> > Since this is the PBX core and this changes some fundamental
 > usage
 > > of it across the tree I'm not sure I'm comfortable with this
 > going
 > > into 11. It may not be correct there, but I can't think of any
 > > issues filed as a result of it. We also don't have tests that
 > cover
 > > the areas impacted. I'll let others join in on their thoughts.
 > 
 > Are you concerned about:
 > 1) Creation of the nolock variants or changes made to existing
 > functions to allow this?
 > or
 > 2) The mass code update to stop using the unsafe variants of the
 > functions?
 > 
 > If it's #1 then I'd like to wait for more opinions.  If it's #2 I
 > can reduce amount of changes, but ultimately I believe they are all
 > correct fixes to race conditions that do exist in code.  I don't
 > think we can confidently say that this hasn't been experienced and
 > reported, just that it hasn't been successfully triaged.  Race
 > conditions can be very difficult to trace.

The first point is really the more relevant of the two points, and the second is moot if we don't agree on the API changes. So.

I'm wary about having public 'nolock' variants of these functions. An ast_context struct is opaque, and exposing the locking semantics of an opaque struct feels like we're solving this problem in the wrong way. Users of the PBX core should not have to worry about locking or unlocking an opaque structure; they should be able to request a context structure and manipulate as needed. If they need to alter a property on the structure, the API presented by the PBX core should be responsible for managing the thread safety.

Really, an ast_context should be an ao2 object. When retrieving it, its reference should be bumped. Destroying it should drop the pbx core's reference that it holds for the context, and remove the context from the dialplan. If the ast_context struct should be mutated, that safety should be provided by the PBX core - but that isn't really the issue here: the issue is lifetime, and that should be handled by reference counting the object.

The race condition you've noticed is really a fundamental problem with how things are structured in the PBX core, and is at the root of several other problems as well - most notably the pbx_realtime/autoservice/recursive locking deadlock shenaniganry.

While this is a good attempt at fixing the problem, it kind of just perpetuates the poor architecture we have in the PBX core. If we're going to go down the road of fixing this for a theoretical race condition problem, we might as well fix it correctly in master, and then decide if it is worth backporting.

-- 
To view, visit https://gerrit.asterisk.org/481
To unsubscribe, visit https://gerrit.asterisk.org/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1866b6787730c9c4f3f836b6133ffe9c820734fa
Gerrit-PatchSet: 1
Gerrit-Project: asterisk
Gerrit-Branch: 11
Gerrit-Owner: Corey Farrell <git at cfware.com>
Gerrit-Reviewer: Corey Farrell <git at cfware.com>
Gerrit-Reviewer: Joshua Colp <jcolp at digium.com>
Gerrit-Reviewer: Matt Jordan <mjordan at digium.com>
Gerrit-HasComments: No



More information about the asterisk-code-review mailing list