[asterisk-dev] Locking, coding guidelines addition
Russell Bryant
russell at digium.com
Fri Jun 27 13:17:53 CDT 2008
After doing some code review and responding to changes to this list,
and then the following discussion on the #asterisk-dev channel, it has
occurred to me that the fundamental locking rules for the boundary
between channel drivers and the Asterisk core have not sufficiently
been documented. Thus, I propose the following addition to the coding
guidelines document.
I welcome any feedback. If anything is not clear, I will do my best
to clarify it. If anything seems wrong, then we have an even bigger
problem, because this describes how the code is written today. This
is essentially an attempt to document some of what has come out of the
evolution of the Asterisk code base, not what I would consider ideal.
---------------------------------------------------------
--- Locking
---------------------------------------------------------
A) Locking Fundamentals
Asterisk is a heavily multithreaded application. It makes extensive
use of locking to ensure safe access to shared resources between
different threads.
When more that one lock is involved in a given code path, there is the
potential for deadlocks. A deadlock occurs when a thread is stuck
waiting for a resource that it will never acquire. Here is a classic
example of a deadlock:
Thread 1 Thread 2
------------ ------------
Holds Lock A Holds Lock B
Waiting for Lock B Waiting for Lock A
In this case, there is a deadlock between threads 1 and 2. However,
this deadlock would have been avoided if both threads 1 and 2 had
agreed that one must acquire Lock A before Lock B, or the other way
around.
We have now arrived at the fundamental rule for dealing with multiple
locks.
Thou must not violate the rules of locking order.
The agreed upon locking order depends on the code, but when using
multiple locks, an order _must_ be established, and once established,
must be respected.
B) Minding the boundary between channel drivers and the Asterisk core
The #1 cause of deadlocks in Asterisk is by not properly following the
locking rules that exist at the boundary between channel drivers and
the Asterisk core. The Asterisk core allocates an ast_channel, and
channel drivers allocate technology specific private data that is
associated with an ast_channel. Typically, both the ast_channel and
technology pvt (private) data have their own lock. There are _many_
code paths that require both objects to be locked.
The locking order in this situation is the following:
1) ast_channel
2) technology pvt data
Channel drivers implement the ast_channel_tech interface to provide a
channel implementation for Asterisk. Most of the channel_tech
interface callbacks are called with the associated ast_channel
locked. When accessing technology pvt data, the pvt can be locked
directly.
The complication arises in the opposite code path. Consider the
following situation:
1) A message comes in over the "network"
2) The CD (channel driver) monitor thread receives the message
3) The CD associates the message with a pvt struct and locks the
pvt
4) While processing the message, the CD must do something that
requires
locking the ast_channel
This is the point that must be handled carefully. Given the specified
locking order, it seems as if the following psuedo-code would be
sufficient:
unlock(pvt);
lock(ast_channel);
lock(pvt);
Unfortunately, that's not good enough in Asterisk. All locks (in the
case of mutexes, not rwlocks) are configured as recursive locks. That
means that the same thread can lock the same lock multiple times,
which essentially increments a counter. So, an unlock() may not
actually make the thread release a lock.
So, instead, we rely on a different construct to ensure that both
locks are acquired safely.
while (trylock(ast_channel) == FAILURE) {
unlock(pvt);
usleep(1); /* yield to other threads */
lock(pvt);
}
Technically, we are not following the specified locking order. We
will acquire the pvt lock before the ast_channel lock. However, by
looping in this fashion, we are safe from deadlocks. We never block
waiting on the ast_channel lock. If we fail to acquire the
ast_channel, we give other threads a chance to grab the pvt lock.
This is to allow alternate code paths in other threads that have the
ast_channel lock and call into the channel driver to safely acquire
the pvt.
C) Locking multiple channels.
The next situation to consider is what to do when you need a lock on
multiple ast_channels. If we did not use recursive mutexes, the
following construct would be sufficient:
lock(MIN(chan1, chan2));
lock(MAX(chan1, chan2));
That type of code would follow an established locking order of always
locking the channel that has a lower address first. Again, since we
use recursive mutexes, this isn't safe. We use the same construct as
described at the end of section B, instead.
lock(chan1);
while (trylock(chan2) == FAILURE) {
unlock(chan1);
usleep(1);
lock(chan1);
}
D) Documenting locking rules.
Finally, it is required that locking order rules are documented for
every lock introduced into Asterisk. This is done almost nowhere in
the existing code. However, it will be expected to be there for newly
introduced code. Over time, this information should be added for all
of the existing lock usage.
---------------------------------------------------------
---------------------------------------------------------
--
Russell Bryant
Senior Software Engineer
Open Source Team Lead
Digium, Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.digium.com/pipermail/asterisk-dev/attachments/20080627/bb5ce5ee/attachment.htm
More information about the asterisk-dev
mailing list