[asterisk-dev] Locking, coding guidelines addition

Russell Bryant russell at digium.com
Thu Jul 3 21:57:44 CDT 2008

----- "Stephen Davies" <stephen.l.davies at gmail.com> wrote:

> Hi,
> Call me a troublemaker, or maybe Captain Obvious' cousin, but
> locking/deadlock issues have been behind 90% or more of the issues
> I've got involved in, bug reported, or fixed, over the last few years.
>   So Russell: I think _most_ Asterisk issues are to do with this
> problem, and we _do_ have an even bigger problem.  (Maybe that's
> because I do lots of IAX2, and iax2 has really suffered from these
> issues since the worker thread stuff was put in).

Words can not describe the pain and frustrations that those changes to IAX2 have caused for myself and others, including you, I'm sure.  The end result is nice, though.  :)

> Unfortunately its an architectural problem.  
> I think the discussion reveals that the magic trylock while loop looks
> neato but doesn't really solve the problem.  Tzafrir and Luigi pointed
> out some flaws.

It doesn't just look neato.  It does actually work, except for when the recursive property of the locks are used.

> Mutexes and all this stuff are safe _provided_ that a locking order is
> respected.  If locking order isn't respected, you're always on the
> bleeding edge where some innocuous change tips something else over the
> edge and, voila, a deadlock.
> This is all harder in Asterisk because locks are help sometimes over
> large areas of code - making it hard to see all the possible code
> paths.  
> So it seems to me that the nettle must be grasped, a strict locking
> order defined, and the code needs to be gradually adjusted to match
> that definition.  I don't see any other way.
> In your example of an incoming call for a channel - well, the channel
> must lock the ast_channel before the pvt structure.  (or, remove the
> pvt lock as Steve suggested).
> It would also help to clean up the coding style so that the "scope" of
> a held lock is more clearly visible in the code structure.  This helps
> comprehension.
> By the way - a locking order is a tricky thing.  It needs to be
> defined *within* a class of object too.  Like: if a thread needs to
> lock more than one channel - there must be a defined order between the
> channels, too.

Locking order between two objects of the same type is the ideal thing to do, but there are other thread-safe options.  Lock 1, trylock 2, etc ...

> I note also that introducing astobj2 into chan_iax2 also had an
> initial bad effect on stability, showing that there are dragons down
> that road, too.  I guess they are in the requirement to reference
> count.

I think the minor period of instability caused by that was just due to knocking out the bugs in such a major core change to the code.

Anyway, all of these things are the point of this exercise to document the current state of locking, as well as to provide a set of recommendations for new code, as well as cleaning up current code.  If you have any recommendations for the text that is now in the coding guidelines, then please share them so that we may keep improving the situation.

Russell Bryant
Senior Software Engineer
Open Source Team Lead
Digium, Inc.

More information about the asterisk-dev mailing list