[Asterisk-Dev] Fundamental locking (race/deadlock) problem in ast_hangup()

Thu Aug 25 04:25:19 MST 2005

After thinking some more about this during my own channel driver
development, I believe that Hans Petter Selasky points out a real
problem. I also think I have a simple solution.

The root of the problem is in ast_hangup(), which essentially does this:

    int ast_hangup(struct ast_channel *chan)
    {
        ast_mutex_lock(&chan->lock);
        chan->tech->hangup(chan);
        ast_mutex_unlock(&chan->lock);
        ast_channel_free(chan);
    }

The code apparently tries to avoid races by locking chan->lock, but that
is totally bogus, IMO. If another thread should be waiting for the lock,
it will immediately die when ast_hangup() unlocks chan->lock and calls
ast_channel_free().

But worse, the locking actually works to prevent the channel drivers
from properly implementing tech->hangup(). The correct way for a monitor
or other thread in a driver to access struct ast_channel is to keep a
mapping from circuit number to struct ast_channel * for the active
curcuits. This mapping must be protected by some lock (gmutex, say), so
that the monitor thread may atomically look up a curcuit, and
ast_hangup() may atomically remove a curcuit from the mapping.

But this will lead to a deadlock, since now ast_hangup(chan) will first
lock chan->lock, and then gmutex. The monitor thread will first lock
gmutex (to find the struct ast_channel *), and then lock chan->lock. The
result is a deadlock.

One solution to this is for the tech->hangup() function to unlock
chan->lock (which doesn't cause any further problems since holding the
lock is worthless anyway, as described above). But that is just ugly I
think, and the correct way would be to simply remove the
locking/unlocking of chan->lock from the ast_hangup() function.

Comments? If not, I will open a bug in Mantis with a suitable patch to
this effect.

If anyone needs further convincing, just look at this mess from
chan_zap.c; I think it was done to prevent exactly this issue:

    static void zap_queue_frame(struct zt_pvt *p, struct ast_frame *f, struct zt_pri *pri)
    {
        /* We must unlock the PRI to avoid the possibility of a deadlock */
        ast_mutex_unlock(&pri->lock);
        for (;;) {
            if (p->owner) {
                if (ast_mutex_trylock(&p->owner->lock)) {
                    ast_mutex_unlock(&p->lock);
                    usleep(1);
                    ast_mutex_lock(&p->lock);
                } else {
                    ast_queue_frame(p->owner, f);
                    ast_mutex_unlock(&p->owner->lock);
                    break;
                }
            } else
                break;
        }
        ast_mutex_lock(&pri->lock);
    }

This is essentially a busy wait with an attempt to yield the CPU with
usleep(1). Yuck.

 - Kristian.

-- 
Kristian Nielsen   kn at sifira.dk
Development Manager, Sifira A/S