[asterisk-dev] Proper use of locking.

Thu Jul 19 13:36:25 CDT 2007

Michael Cargile wrote:

> I stand corrected. As far as naming is concerned (reentrant vs
> recursive) it appears to have more to do with whom you learned locking
> from and the terms that they used. As far as how I got that idea, the
> last time we did deadlock debugging in the Asterisk source code
> (somewhere in the head of 1.4) there were issues with some of the locks
> being non-recursive. Since then, I have not looked too much into the
> Asterisk code till recently when I got sick of my call quality going out
> the window due to performance problems. On top of that my boss within
> the last few weeks was having a discussion via IRC with other developers
> on how to fix some non-recursive locks (most likely the rwlocks). You
> are right though I should have taken the time to research the recursive
> locks more thoroughly in Asterisk before I stated that.

We use the term 'recursive' explicitly because we are using pthread
locks and that is what the pthreads standards call them. As far as
rwlocks go, there is no usage of rwlocks in any released Asterisk
branch, only in SVN trunk.

I can't imagine there ever being non-recursive mutex locks in the
Asterisk code base during the timeframe you are talking about, since
they have been recursive mutexes since before I began working with
Asterisk three years ago...

> This does not change the fact that locking is being poorly used in
> Asterisk and is having a negative impact on performance as well as
> making it extremely difficult to debug problems. Any good primer on
> locking will tell you to acquire a lock and release it as fast as
> possible. Hell even the developer of the Marco-Exclusive application
> within Asterisk says to do so. Waiting on a slow, shared resource such
> as a hard drive is just plain dumb. Plus there seems to be plenty of
> places where rather than using several locks that only block threads
> that could perform conflicting actions there is one monolithic lock that
> blocks all threads. 

No doubt; this is a very complex area to work in, and we have to be
extremely careful when we make these changes to fully understand their
impact and potential negatives. In SVN trunk some of these places have
already been changed to rwlocks, which will make an enormous difference
since the execution profile for many of these code paths is 90% reads
and 10% writes (or even more skewed).

> The only way I see to facilitate this is for those people who are
> committing patches to reject ones that do silly things like those that I
> wrote about in my previous email, have a janitorial project to scour the
> code looking for these issues and fix them, and to write a good locking
> practices section into the coding guidelines. I am more than willing to
> help with all of these (except the checking patches for commit as I do
> not have commit writes). We will probably be hiring someone else to do
> so as well here in the not too distant future. We are planning to use
> Asterisk for very a large call volume application shortly and having
> Asterisk stable and performing optimally will save us far more than the
> cost of a full time developer.

As anyone who has posted a complex patch to the issue tracker will tell
you, we already very stringently check contributions (and our own code)
for things like this, and in many cases this causes us to reject patches
and require the contributor to go back and re-work their code. While I
won't in any way say that we are perfect at it, I can't believe that
there have been patches accepted in the recent past that have such
egregious locking problems as you are describing; if there are, I would
appreciate pointers to them.

-- 
Kevin P. Fleming
Director of Software Technologies
Digium, Inc. - "The Genuine Asterisk Experience" (TM)