[asterisk-bugs] [JIRA] (ASTERISK-21207) [patch] - Deadlock on fax extension calling ast_async_goto() with locked channel

Masahide Yamamoto (JIRA) noreply at issues.asterisk.org
Sun Jul 6 19:29:57 CDT 2014


    [ https://issues.asterisk.org/jira/browse/ASTERISK-21207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=220288#comment-220288 ] 

Masahide Yamamoto edited comment on ASTERISK-21207 at 7/6/14 7:29 PM:
----------------------------------------------------------------------

Thank you for your comment.

I'm sorry this is the first time for me to post a comment in Asterisk's issue tracking system. I'll be more careful.
I didn't intend you to review the patch nor appeal for adoption of the patch to the maintainers at all.
Yes, I agree with you for ``the correct solution would be to make the contexts objects in the core reference counted". 
I just wanted you to know this issue is still even in the latest Asterisk 1.8 and I just wanted to provide information for users to avoid such deadlock bugs in a reasonably easy way.

How about the idea of counting the number of recursive mutex locks which I mentioned, although I didn't check if it works in other environment than linux/glibc so that people can check before whole Asterisk gets entirely locked? I wonder why such a function has not been added in Asterisk so far.

I think, anyway, we need to double-check all the paths of the function calls such as ast_channel_alloc, ast_do_masquerade, etc, in which threads cannot hold any locks to prevent unwanted deadlock issues from being happened, or at least we need to have some mechanism which can avoid whole system deadlock issues; ie: we need to limit the extent of the impact of deadlock issues.

I thought my patch provided was helpful to do such a task.

Do I need to submit aforementioned patch in a file form?


was (Author: m.yamamoto):
Thank you for your comment.

I'm sorry this is the first time for me to post a comment in Asterisk. I'll be more careful.
I didn't intend you to review the patch nor appeal for adoption of the patch to the maintainers at all.
Yes, I agree with you for ``the correct solution would be to make the contexts objects in the core reference counted". 
I just wanted you to know this issue is still even in the latest Asterisk 1.8 and I just wanted to provide information for users to avoid such deadlock bugs in a reasonably easy way.

How about the idea of counting the number of recursive mutex locks which I mentioned, although I didn't check if it works in other environment than linux/glibc so that people can check before whole Asterisk gets entirely locked? I wonder why such a function has not been added in Asterisk so far.

I think, anyway, we need to double-check all the paths of the function calls such as ast_channel_alloc, ast_do_masquerade, etc, in which threads cannot hold any locks to prevent unwanted deadlock issues from being happened, or at least we need to have some mechanism which can avoid whole system deadlock issues; ie: we need to limit the extent of the impact of deadlock issues.

I thought my patch provided was helpful to do such a task.

Do I need to submit aforementioned patch in a file form?

> [patch] - Deadlock on fax extension calling ast_async_goto() with locked channel
> --------------------------------------------------------------------------------
>
>                 Key: ASTERISK-21207
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-21207
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Channels/chan_dahdi, Channels/chan_sip/General
>    Affects Versions: 10.7.1, 11.2.1
>         Environment: CentOS 6.3 x86_64
>            Reporter: Ashley Winters
>            Severity: Critical
>         Attachments: backtrace-threads.txt, core-show-locks.txt, fax-deadlock.patch, fax-deadlock-v2.patch, fax-deadlock-v2.patch-11.3.0, gdb-fax-deadlock.txt, issue_log
>
>
> On an asterisk system with heavy use of AGI and inbound CNG-detected faxing, occasionally all channel activity will freeze. Running 'core show channels' returns nothing, but the logs continue running with anything except channel activity. Running with 'sip set debug on' shows that chan_sip.c doesn't even claim to be reading packets anymore.
> This deadlock was triggered several times daily across our array of asterisk servers, which process hundreds of faxes and tens of thousands of calls daily.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list