[Asterisk-Dev] Fundamental locking (race/deadlock) problem in ast_hangup()

Hans Petter Selasky hselasky at c2i.net
Fri Aug 26 06:45:54 MST 2005


Hi,

I have an ISDN driver that uses mutexes, and I think it might be a good idea 
if asterisk can adopt some of the techniques I use.

In the following text, "chan" is a pointer of type "struct ast_channel *".

1) One cannot send data/or signals to another thread, hence we need to hold a 
lock while transferring data! IMPORTANT! I suggest that Asterisk use 
callbacks instead!


2) The lock in "struct ast_channel" must go away. Instead there must be:

struct ast_channel {
 struct ast_mutex *priv_lock; // the lock that protects the channel driver
 struct ast_mutex *ast_lock; // the lock that is used by Asterisk
 u_int8_t refcount; // structure reference count
 u_int8_t present; // set if hangup has not been called
 ...
};

3) The locking order is defined like this.

  chan->priv_lock first
  chan->ast_lock  second


4) In the channel driver:

lock(priv_lock);

chan = lookup ast_channel in list;

  lock(chan->ast_lock);

    call a Asterisk callback ;

    // if this is hangup
    // {
    //   chan->present = 0;
    //   chan->refcount--;
    // }
    //

  refcount = chan->refcount;

  unlock(chan->ast_lock);

  if(refcount == 0) 
  {
     free(chan);
  }

unlock(priv_lock);

Comment to 4):

In this scheme it is important that the channel driver has its lock locked 
from it looks up "struct ast_channel", and until after that Asterisk has been 
called, if that is needed.


5) In asterisk:

lock(ast_lock);

chan = lookup ast_channel in list;

chan->refcount++;

unlock(ast_lock);

// NOTE: if we sleep here, and 4) calls hangup
// chan->present will be set to 0 !
//

lock(chan->priv_mtx);

if(chan->present)
{
   // if this is hangup
   // {
   //    lock(ast_lock);
   //    unsetup "chan" here
   //    chan->present = 0
   //    chan->refcount--
   //    unlock(ast_lock);
   // }
   //   

   call channel driver callback;
}

unlock(chan->priv_mtx);

lock(ast_lock);
refcount = --chan->refount;
unlock(ast_lock);

if(refcount == 0)
{
      free(chan);
}


6) At channel allocation

set "chan->ast_lock"
set "chan->priv_lock"
set "chan->present" = 1
set "chan->refcount" = 1

7) At hangup in :

chan->present = 0;
chan->refcount --


General comments:

The idea is that the last thread that has anything to do with "struct 
ast_channel" frees it. That way we avoid trouble. Detecting this is the 
purpose of "chan->refcount".

The channel driver has precedence over Asterisk, and if it calls hangup, 
callbacks from Asterisk to the channel driver must be stopped. That is the 
purpose of the "chan->present".


TODO:

Make some nice helper routines that do locking, and check the reference count 
for zero!


Any comments ?



On Thursday 25 August 2005 13:25, Kristian Nielsen wrote:
> After thinking some more about this during my own channel driver
> development, I believe that Hans Petter Selasky points out a real
> problem. I also think I have a simple solution.
>
> One solution to this is for the tech->hangup() function to unlock
> chan->lock (which doesn't cause any further problems since holding the
> lock is worthless anyway, as described above). But that is just ugly I
> think, and the correct way would be to simply remove the
> locking/unlocking of chan->lock from the ast_hangup() function.

"chan->mtx_priv" must be unlocked before locking the 
channel drivers lock, but after that we can lock "chan->ast_lock" again !
See my examples above.

>
> Comments? If not, I will open a bug in Mantis with a suitable patch to
> this effect.

Just make sure it gets fixed.

>
> If anyone needs further convincing, just look at this mess from
> chan_zap.c; I think it was done to prevent exactly this issue:
>
>     static void zap_queue_frame(struct zt_pvt *p, struct ast_frame *f,
> struct zt_pri *pri) {
>         /* We must unlock the PRI to avoid the possibility of a deadlock */
>         ast_mutex_unlock(&pri->lock);
>         for (;;) {
>             if (p->owner) {
>                 if (ast_mutex_trylock(&p->owner->lock)) {
>                     ast_mutex_unlock(&p->lock);
>                     usleep(1);
>                     ast_mutex_lock(&p->lock);
>                 } else {
>                     ast_queue_frame(p->owner, f);
>                     ast_mutex_unlock(&p->owner->lock);
>                     break;
>                 }
>             } else
>                 break;

This wait loop is the wrong way to do it.

>         }
>         ast_mutex_lock(&pri->lock);
>     }
>
> This is essentially a busy wait with an attempt to yield the CPU with
> usleep(1). Yuck.

--HPS



More information about the asterisk-dev mailing list