[asterisk-bugs] [JIRA] (ASTERISK-25000) Deadlock in ast_do_masquerade (specifically in ast_hangup on the zombie clone if it's hungup during the masquerade)

William luke (JIRA) noreply at issues.asterisk.org
Thu Apr 23 05:54:32 CDT 2015


     [ https://issues.asterisk.org/jira/browse/ASTERISK-25000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

William luke updated ASTERISK-25000:
------------------------------------

    Description: 
We're seeing a deadlock where the AMI thread completely locks up. (Thread ID 19109 in backtrace attachment)

A backtrace shows that it's while doing a dual redirect.
When redirecting the second channel (from manager.c:3895), inside ast_do_masquerade, we decide the clone was a zombie, and then in channel.c line 7331 call ast_hangup on it.
This ast_hangup tries to grab a channel lock (channel.c:2885) and hangs here indefinitely.
What's peculiar is that a few lines higher up it's successfully managed to grab and then release this same channel lock.
It would seem that as the masquerade begun, this channel (clonechan) had at the same moment hungup. (see line 212288 in the verboselog attachment. The channel in question is "SIP/gl-agw-01-000f7f0e")
So something has happened to the state of this channel and something has not release it's channel_lock.
I'm unable to see which other thread is holding the lock.

The issue occured at 15:32:20 in the verboselog file. The first part of the dual redirect can be seen at line 212279.

I executed a "core restart" via the CLI, but this hung the CLI, and I had to kill the Asterisk process.

  was:
We're seeing a deadlock where the AMI thread completely locks up. (Thread ID 19109 in backtrace attachment)

A backtrace shows that it's while doing a dual redirect.
When redirecting the second channel (from manager.c:3895), inside ast_do_masquerade, we decide the clone was a zombie, and then in channel.c line 7331 call ast_hangup on it.
This ast_hangup tries to grab a channel lock (channel.c:2885) and hangs here indefinitely.
What's peculiar is that a few lines higher up it's successfully managed to grab and then release this same channel lock.
It would seem that as the masquerade begun, this channel (clonechan) had at the same moment hungup. (see line 212288 in the verboselog attachment. The channel in question is "SIP/gl-agw-01-000f7f0e")
So something has happened to the state of this channel and something has not release it's channel_lock.
I'm unable to see which other thread is holding the lock.

I executed a "core restart" via the CLI, but this hung the CLI, and I had to kill the Asterisk process.


> Deadlock in ast_do_masquerade (specifically in ast_hangup on the zombie clone if it's hungup during the masquerade)
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: ASTERISK-25000
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-25000
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Core/Channels
>    Affects Versions: 11.16.0
>         Environment: CentOS 6. Dual Xeon Dell server. Under relatively heavy load (1 million calls/day), with lots of AMI actions.
>            Reporter: William luke
>            Severity: Critical
>         Attachments: backtrace-threads-20150422.txt
>
>
> We're seeing a deadlock where the AMI thread completely locks up. (Thread ID 19109 in backtrace attachment)
> A backtrace shows that it's while doing a dual redirect.
> When redirecting the second channel (from manager.c:3895), inside ast_do_masquerade, we decide the clone was a zombie, and then in channel.c line 7331 call ast_hangup on it.
> This ast_hangup tries to grab a channel lock (channel.c:2885) and hangs here indefinitely.
> What's peculiar is that a few lines higher up it's successfully managed to grab and then release this same channel lock.
> It would seem that as the masquerade begun, this channel (clonechan) had at the same moment hungup. (see line 212288 in the verboselog attachment. The channel in question is "SIP/gl-agw-01-000f7f0e")
> So something has happened to the state of this channel and something has not release it's channel_lock.
> I'm unable to see which other thread is holding the lock.
> The issue occured at 15:32:20 in the verboselog file. The first part of the dual redirect can be seen at line 212279.
> I executed a "core restart" via the CLI, but this hung the CLI, and I had to kill the Asterisk process.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list