[asterisk-dev] 1.8 SIP deadlock on transfer using REFER

Jonathan Thurman jonathan at thurmantech.com
Sun Dec 19 22:54:37 UTC 2010


On Fri, Dec 17, 2010 at 7:49 AM, Russell Bryant <russell at digium.com> wrote:
> I just took a look.  The "core show locks" output is what is most
> helpful to me in seeing where the problem is.
>
> This problem is one of the more common deadlock situations that come up
> in Asterisk.  There is a lock associated with the ast_channel as well as
> a lock for the private data in the channel driver (sip_pvt in this
> case).  Based on Asterisk locking rules, it is only safe to lock the
> ast_channel first, and the sip_pvt second.  Otherwise, a deadlock
> avoidance technique of some kind must be used.
>
> The bug is in the do_monitor thread in the "core show locks" output.  In
> that thread, you can see these locks:
>
>    1) netlock (irrelevant to this bug)
>    2) sip_pvt (variable i in sip_new)
>    3) channel <--- Blocking while waiting for this lock
>
> The bug here is that the code is trying to lock the ast_channel while
> holding the sip_pvt lock.  There are a few different solutions, and
> which one is most appropriate requires more in depth investigation of
> the code path in question.
>
> 1) Unlock the sip_pvt before doing anything that is going to try to grab
> the channel lock.
>
> 2) If you must hold both the sip_pvt and the channel lock at the same
> time, you can either:
>
> 2.a) lock the channel first, then the pvt (not usually an option,
> actually).
>
> 2.b) Acquire both locks using a deadlock avoidance loop like this:
>
>    lock(sip_pvt);
>    while (trylock(ast_channel)) {
>        unlock(sip_pvt);
>        sched_yield();
>        lock(sip_pvt);
>    }
>
>    ... do whatever ...
>
>    unlock(ast_channel);
>    unlock(sip_pvt);

Thanks for the feedback Russell.

I started to look at what changed between 1.6.2 and 1.8 in that
section of the code, and it looks like this is related to the
processing of parking.  Specifically, the function used to identify if
the blind-transfer extension is a valid parking lot number.  Within
that function (main/features.c: ast_parking_ext_valid) there is a call
to pbx_find_extension which then calls ast_autoservice_stop on the
channel and Deadlock!



More information about the asterisk-dev mailing list