[Asterisk-Dev] Fundamental locking (race/deadlock) problem in
ast_hangup()
Hans Petter Selasky
hselasky at c2i.net
Fri Aug 26 06:45:54 MST 2005
Hi,
I have an ISDN driver that uses mutexes, and I think it might be a good idea
if asterisk can adopt some of the techniques I use.
In the following text, "chan" is a pointer of type "struct ast_channel *".
1) One cannot send data/or signals to another thread, hence we need to hold a
lock while transferring data! IMPORTANT! I suggest that Asterisk use
callbacks instead!
2) The lock in "struct ast_channel" must go away. Instead there must be:
struct ast_channel {
struct ast_mutex *priv_lock; // the lock that protects the channel driver
struct ast_mutex *ast_lock; // the lock that is used by Asterisk
u_int8_t refcount; // structure reference count
u_int8_t present; // set if hangup has not been called
...
};
3) The locking order is defined like this.
chan->priv_lock first
chan->ast_lock second
4) In the channel driver:
lock(priv_lock);
chan = lookup ast_channel in list;
lock(chan->ast_lock);
call a Asterisk callback ;
// if this is hangup
// {
// chan->present = 0;
// chan->refcount--;
// }
//
refcount = chan->refcount;
unlock(chan->ast_lock);
if(refcount == 0)
{
free(chan);
}
unlock(priv_lock);
Comment to 4):
In this scheme it is important that the channel driver has its lock locked
from it looks up "struct ast_channel", and until after that Asterisk has been
called, if that is needed.
5) In asterisk:
lock(ast_lock);
chan = lookup ast_channel in list;
chan->refcount++;
unlock(ast_lock);
// NOTE: if we sleep here, and 4) calls hangup
// chan->present will be set to 0 !
//
lock(chan->priv_mtx);
if(chan->present)
{
// if this is hangup
// {
// lock(ast_lock);
// unsetup "chan" here
// chan->present = 0
// chan->refcount--
// unlock(ast_lock);
// }
//
call channel driver callback;
}
unlock(chan->priv_mtx);
lock(ast_lock);
refcount = --chan->refount;
unlock(ast_lock);
if(refcount == 0)
{
free(chan);
}
6) At channel allocation
set "chan->ast_lock"
set "chan->priv_lock"
set "chan->present" = 1
set "chan->refcount" = 1
7) At hangup in :
chan->present = 0;
chan->refcount --
General comments:
The idea is that the last thread that has anything to do with "struct
ast_channel" frees it. That way we avoid trouble. Detecting this is the
purpose of "chan->refcount".
The channel driver has precedence over Asterisk, and if it calls hangup,
callbacks from Asterisk to the channel driver must be stopped. That is the
purpose of the "chan->present".
TODO:
Make some nice helper routines that do locking, and check the reference count
for zero!
Any comments ?
On Thursday 25 August 2005 13:25, Kristian Nielsen wrote:
> After thinking some more about this during my own channel driver
> development, I believe that Hans Petter Selasky points out a real
> problem. I also think I have a simple solution.
>
> One solution to this is for the tech->hangup() function to unlock
> chan->lock (which doesn't cause any further problems since holding the
> lock is worthless anyway, as described above). But that is just ugly I
> think, and the correct way would be to simply remove the
> locking/unlocking of chan->lock from the ast_hangup() function.
"chan->mtx_priv" must be unlocked before locking the
channel drivers lock, but after that we can lock "chan->ast_lock" again !
See my examples above.
>
> Comments? If not, I will open a bug in Mantis with a suitable patch to
> this effect.
Just make sure it gets fixed.
>
> If anyone needs further convincing, just look at this mess from
> chan_zap.c; I think it was done to prevent exactly this issue:
>
> static void zap_queue_frame(struct zt_pvt *p, struct ast_frame *f,
> struct zt_pri *pri) {
> /* We must unlock the PRI to avoid the possibility of a deadlock */
> ast_mutex_unlock(&pri->lock);
> for (;;) {
> if (p->owner) {
> if (ast_mutex_trylock(&p->owner->lock)) {
> ast_mutex_unlock(&p->lock);
> usleep(1);
> ast_mutex_lock(&p->lock);
> } else {
> ast_queue_frame(p->owner, f);
> ast_mutex_unlock(&p->owner->lock);
> break;
> }
> } else
> break;
This wait loop is the wrong way to do it.
> }
> ast_mutex_lock(&pri->lock);
> }
>
> This is essentially a busy wait with an attempt to yield the CPU with
> usleep(1). Yuck.
--HPS
More information about the asterisk-dev
mailing list