[asterisk-bugs] [JIRA] (ASTERISK-21234) Deadlock when using two Local channels & fax gateway (local_queryoption)

Faidon Liambotis (JIRA) noreply at issues.asterisk.org
Wed Mar 13 05:45:01 CDT 2013


    [ https://issues.asterisk.org/jira/browse/ASTERISK-21234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=204156#comment-204156 ] 

Faidon Liambotis commented on ASTERISK-21234:
---------------------------------------------

The production system was much more complicated than that, with multiple AGIs in the path that did all that, this is a much simplified version that was created for the purposes of reproducing in the lab and bug reporting. But yes, there was no good reason and this isn't the case in production anymore. It is a deadlock though and I thought it warranted a bug report, albeit with a warning about being a corner case as I said in my first sentence :)

I can compile with DEBUG_THREADS, although I've found the locks and turn of events exactly from the backtrace above so I'm not sure how much more it'll help you. More specifically:

Thread 2 locks {{0x2eac188}} (let's call that lock A) in {{ast_indicate_data}}, which then proceeds through the framehook, tries to get the T.38 state and ends up in {{local_queryoption}} which calls
{code}
query_cleanup:
        if (bridged) {
                res = ast_channel_queryoption(bridged, option, data, datalen, 0);
                bridged = ast_channel_unref(bridged);
        }
{code}
with {{bridged}} being {{0x309e238}} and {{ast_channel_queryoption}} immediately trying to get a lock for that channel (lock B).

Thread 3 does the same, starting with {{ast_indicate_data}} for channel {{0x309e238}} which locks it (lock B again), goes through the same framehook, ends up in {{local_queryoption}} and specifically:
{code}
        ao2_lock(p);
        if (!(tmp = IS_OUTBOUND(ast, p) ? p->owner : p->chan)) {
                ao2_unlock(p);
                return -1;
        }
        ast_channel_ref(tmp);
        ao2_unlock(p);
        ast_channel_unlock(ast); /* Held when called, unlock before locking another channel */

        ast_channel_lock(tmp);
{code}
It gets the {{tmp}}  which is {{0x2eac188}} and then tries to lock it with {{ast_channel_lock(tmp);}}, getting lock A.

So, thread 2 holds AB and thread 3 holds BA and the threads deadlock and both of the channels end up locked for good and blocking other operations, effectively killing the system.
                
> Deadlock when using two Local channels & fax gateway (local_queryoption)
> ------------------------------------------------------------------------
>
>                 Key: ASTERISK-21234
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-21234
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Channels/chan_local
>    Affects Versions: 11.2.1
>            Reporter: Faidon Liambotis
>            Assignee: Faidon Liambotis
>         Attachments: 2, 3
>
>
> There's a corner case when using two Local channels in series and having the T.38 fax gateway enabled. It seems there's a race and eventual deadlock on two channel locks (AB/BA). This quickly brings the rest of the system down (SIP monitoring thread gets stuck as is every other operation which enumerates channels and trying to get locks on them).
> The issue is fully reproducible using a load generator in 5-10' using this purposefully trivialized dialplan:
> {code}
> [incoming]
> exten => _X.,1,Set(FAXOPT(gateway)=yes)
> exten => _X.,2,Dial(Local/${EXTEN}@local2)
> [local2]
> exten => _X.,1,Set(FAXOPT(gateway)=yes)
> exten => _X.,2,Dial(Local/${EXTEN}@local1)
> [local1]
> exten => _X.,1,Set(FAXOPT(gateway)=yes)
> exten => _X.,2,Dial(SIP/sip2/${EXTEN})
> {code}
> Attached is the backtrace for the two deadlocked threads when running with the above dialplan.
> Both threads lock their respective channels in {{ast_indicate_data}}, then race in {{local_queryoption}} and deadlock each other. The whole process of locking/unlocking in {{local_queryoption}} looks fishy and is most likely the culrpit of this deadlock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.asterisk.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



More information about the asterisk-bugs mailing list