[asterisk-dev] How to catch the source of a deadlock?

Matthew Jordan mjordan at digium.com
Tue Jan 27 11:45:01 CST 2015


On Tue, Jan 27, 2015 at 11:04 AM, Yousf Ateya <y.ateya at starkbits.com> wrote:
>
> Yes, and I supplied the debug logs in the issue ASTERISK-24478 . I was trying to find a solution to this bug.
>
> What I am doing now is to run asterisk in debugger (gdb); but it prints TONs of debug message.
> That is why I am looking fot any way to catch the cause of dead locks.
>

I don't think you actually have a deadlock here.

First, the DETECT_DEADLOCKS option is going to spam you. I'd expect
that when it finds something holding onto a lock for a long period of
time, which is what your debug log shows. That being said, the
DETECT_DEADLOCKS option - once it starts telling you something - isn't
all that useful. Generally, I'd just run with DEBUG_THREADS when
debugging these kinds of things.

Looking at your 'core show locks' output, you don't actually have
circular waiting. You have a thread that is holding onto an IAX call
number lock (iaxsl[fr->callno]), and a thread that wants it. However,
the thread holding onto the call number lock isn't waiting for another
lock: it's just holding it in transmit_frame.

Looking at the transmit_frame function:

static int transmit_frame(void *data)
{
    struct iax_frame *fr = data;

    ast_mutex_lock(&iaxsl[fr->callno]);

    fr->sentyet = 1;

    if (iaxs[fr->callno]) {
        send_packet(fr);
    }

    if (fr->retries < 0) {
        ast_mutex_unlock(&iaxsl[fr->callno]);
        /* No retransmit requested */
        iax_frame_free(fr);
    } else {
        /* We need reliable delivery.  Schedule a retransmission */
        AST_LIST_INSERT_TAIL(&frame_queue[fr->callno], fr, list);
        fr->retries++;
        fr->retrans = iax2_sched_add(sched, fr->retrytime,
attempt_transmit, fr);
        ast_mutex_unlock(&iaxsl[fr->callno]);
    }

    return 0;
}

We can see that all paths should be unlocking iaxsl[fr->callno],
assuming we move through the function. My guess is that we're stuck on
iax2_sched_add, but a gdb backtrace would show for sure where that
thread is.

However, I'll say this - when Corey found a similar problem in
ASTERISK-24451, I wasn't able to reproduce the leak in the IAX usage
of the scheduler. So your problem may not be easily solved unless you
can figure out why the scheduler is misbehaving.

As a side note: do you have a timing module loaded?

Matt

-- 
Matthew Jordan
Digium, Inc. | Engineering Manager
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at: http://digium.com & http://asterisk.org



More information about the asterisk-dev mailing list