[asterisk-dev] Locking, coding guidelines addition

Sun Jun 29 09:34:29 CDT 2008

On Jun 29, 2008, at 6:46 AM, Luigi Rizzo wrote:

> Probably my original reply on russel's email on locking coding  
> guidelines
>
> http://lists.digium.com/pipermail/asterisk-dev/2008-June/033662.html
>
> did not make it so i am resending.
>
> Documenting locking is certainly useful and the description you
> give is well written.  It suffers from a problem though:
>
>    while it is a faithful description of the way locking is  
> implemented
>    now, it does not clarify that even the trylock-based approach that
>    is presented does not guarantee deadlock avoidance -- e.g. if both
>    contenders have called lock() twice or more, they are still going
>    to loop forever.
>
> Basically there isn't a good solution with recursive locks(*).

Very good point.  I'm quite surprised we haven't seen more bug reports  
caused by this fact.

> All we could do, in terms of documentation, is suggest developers
> to try and avoid exploiting recursive locks, and instead try as much
> as possible to code as if locks were not recursive.
> Eventually, this should let us move, in the long term, to non- 
> recursive
> locks that allows a better handling of deadlock, based on lock
> ordering, or (if you really have to use the trylock/unlock trick)
> makes it slightly easier to check that releasing and reacquiring a
> lock does not lead to inconsistencies.
>
> If you agree to this approach i might have some more suggestions on
> how to reorganize the text on locking.

I completely agree with making this recommendation.

> (*) One would think that the trylock loop can be made deadlock-free
> using an unlock_all() call that invokes unlock() until the lock
> is finally released, and also returns the count so later we can call
> lock() N times. However there is a big danger here -- when you release
> and reacquire the lock, the data structures they protect might change,
> so the code should check that nothing bad happened.
> But one uses recursive locks when he doesn't/cannot track whether
> a function is called with the lock held or not, so you can imagine
> how hard it is for the same code to check that releasing the lock is  
> safe.

It is definitely dangerous, but this is what the code is already  
trying to do in a large number of places.  I think implementing the  
counter would be good, for the sake of avoiding deadlocks.

I'm also thinking about the currently used loop, and wondering if we  
should change what the suggested deadlock avoidance construct is.

Currently:

     while (trylock(ast_channel) == FAILURE) {
         unlock(pvt);
         usleep(1);
         lock(pvt);
     }

Proposed:

     if (trylock(ast_channel) == FAILURE) {
         unlock(pvt);
         lock(ast_channel);
         lock(pvt);
     }

My thinking with the proposed construct is that it would not unlock  
the pvt if it doesn't have to (current code already does this).  It  
would also not waste CPU cycles looping around waiting for the  
ast_channel lock to become available.  Instead, it just uses proper  
locking order to ensure the code is safe.

However, the old trylock loop would still be necessary in situations  
where there is not a defined and respected locking order (as is the  
case when dealing with more than one ast_channel at a time).

--
Russell Bryant
Senior Software Engineer
Open Source Team Lead
Digium, Inc.