[asterisk-dev] 1.8 SIP deadlock on transfer using REFER

Russell Bryant russell at digium.com
Fri Dec 17 15:49:06 UTC 2010


On Thu, 2010-12-16 at 14:34 -0800, Jonathan Thurman wrote:
> I have been trying to debug a SIP deadlock issue
> ( https://issues.asterisk.org/view.php?id=18403 ) and can't seem to
> find it.  I can reproduce this deadlock EVERY time using three
> physical phones, and initiating a blind transfer that uses REFER.  The
> result is Asterisk stops processing all SIP requests until killed and
> restarted.  I have uploaded "core show channels" and a thread
> backtrace per the debugging guidelines, and I am seeking any help in
> resolving this issue.  Thanks in advance for any suggestions!

I just took a look.  The "core show locks" output is what is most
helpful to me in seeing where the problem is.

This problem is one of the more common deadlock situations that come up
in Asterisk.  There is a lock associated with the ast_channel as well as
a lock for the private data in the channel driver (sip_pvt in this
case).  Based on Asterisk locking rules, it is only safe to lock the
ast_channel first, and the sip_pvt second.  Otherwise, a deadlock
avoidance technique of some kind must be used.

The bug is in the do_monitor thread in the "core show locks" output.  In
that thread, you can see these locks:

    1) netlock (irrelevant to this bug)
    2) sip_pvt (variable i in sip_new)
    3) channel <--- Blocking while waiting for this lock

The bug here is that the code is trying to lock the ast_channel while
holding the sip_pvt lock.  There are a few different solutions, and
which one is most appropriate requires more in depth investigation of
the code path in question.

1) Unlock the sip_pvt before doing anything that is going to try to grab
the channel lock.

2) If you must hold both the sip_pvt and the channel lock at the same
time, you can either:

2.a) lock the channel first, then the pvt (not usually an option,
actually).

2.b) Acquire both locks using a deadlock avoidance loop like this:

    lock(sip_pvt);
    while (trylock(ast_channel)) {
        unlock(sip_pvt);
        sched_yield();
        lock(sip_pvt);
    }

    ... do whatever ...

    unlock(ast_channel);
    unlock(sip_pvt);


-- 
Russell Bryant
Digium, Inc.  |  Engineering Manager, Open Source Software
445 Jan Davis Drive NW   -    Huntsville, AL 35806  -  USA
jabber: rbryant at digium.com    -=-    skype: russell-bryant
www.digium.com -=- www.asterisk.org -=- blogs.asterisk.org





More information about the asterisk-dev mailing list