[asterisk-bugs] [JIRA] (ASTERISK-21207) [patch] - Deadlock on fax extension calling ast_async_goto() with locked channel

Masahide Yamamoto (JIRA) noreply at issues.asterisk.org
Mon Jul 7 21:37:57 CDT 2014


    [ https://issues.asterisk.org/jira/browse/ASTERISK-21207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=220282#comment-220282 ] 

Masahide Yamamoto edited comment on ASTERISK-21207 at 7/7/14 9:37 PM:
----------------------------------------------------------------------

We also have been encountering this deadlock issue so far in 1.8 branch.
Unfortunately this issue does not seem to have been fixed in the latest 1.8 branch either.

bq. The ast_channel_unlock in process_sdp was just cut&pasted into place, and is pointless. It only unlocks the recursive lock from a couple lines up, not the big lock held by the caller of process_sdp – ultimately handle_request_do.

According to the above comment from Ashley, the lock in process_sdp seems to need to be disabled so the following ast_exists_extension and ast_async_goto will work without deadlocking like:

--------------------------------------------------------------------
Modified on 06/Jul/14:

\[EDIT\] - removed inline code - mjordan


-FYI: Like the above code snippet, we can use ast_channel_trylock / ast_mutex_trylock for non-blocking lock attempt and checking.-

--------------------------------------------------------------------
Added on 06/Jul/14:

I read the following piece of code about PTHREAD_MUTEX_RECURSIVE_NP, which is the mutex's attribute that Asterisk uses.

In Linux and glibc,

\[EDIT\] - removed inline code

So I realized I made a big misunderstanding about the semantics of Asterisk's trylock function which uses PTHREAD_MUTEX_RECURSIVE_NP, so I rectified my previously mentioned code accordingly.
* I thought ast_channel_trylock always returns EBUSY when a channel given is locked by any threads including the calling thread itself.
* Note that my approach is not portable since I used architecture-specific way (I think it will work at least in Linux/glibc).

Also I put the following code in __ao2_lock, __ao2_unlock and __ao2_trylock as follows to see if everything works as expected:

\[EDIT\] - removed inline code

I'm still testing them, so far so good.


was (Author: m.yamamoto):
We also have been encountering this deadlock issue so far in 1.8 branch.
Unfortunately this issue does not seem to have been fixed in the latest 1.8 branch either.

bq. The ast_channel_unlock in process_sdp was just cut&pasted into place, and is pointless. It only unlocks the recursive lock from a couple lines up, not the big lock held by the caller of process_sdp – ultimately handle_request_do.

According to the above comment from Ashley, the lock in process_sdp seems to need to be disabled so the following ast_exists_extension and ast_async_goto will work without deadlocking like:

--------------------------------------------------------------------
Modified on 06/Jul/14:

\[EDIT\] - removed inline code - mjordan


-FYI: Like the above code snippet, we can use ast_channel_trylock / ast_mutex_trylock for non-blocking lock attempt and checking.-

--------------------------------------------------------------------
Added on 06/Jul/14:

I read the following piece of code about PTHREAD_MUTEX_RECURSIVE_NP, which is the mutex's attribute that Asterisk uses.

In Linux and glibc,

So I realized I made a big misunderstanding about the semantics of Asterisk's trylock function which uses PTHREAD_MUTEX_RECURSIVE_NP, so I rectified my previously mentioned code accordingly.
* I thought ast_channel_trylock always returns EBUSY when a channel given is locked by any threads including the calling thread itself.
* Note that my approach is not portable since I used architecture-specific way (I think it will work at least in Linux/glibc).

Also I put the following code in __ao2_lock, __ao2_unlock and __ao2_trylock as follows to see if everything works as expected:



I'm still testing them, so far so good.

> [patch] - Deadlock on fax extension calling ast_async_goto() with locked channel
> --------------------------------------------------------------------------------
>
>                 Key: ASTERISK-21207
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-21207
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Channels/chan_dahdi, Channels/chan_sip/General
>    Affects Versions: 10.7.1, 11.2.1
>         Environment: CentOS 6.3 x86_64
>            Reporter: Ashley Winters
>            Severity: Critical
>         Attachments: backtrace-threads.txt, core-show-locks.txt, fax-deadlock.patch, fax-deadlock-v2.patch, fax-deadlock-v2.patch-11.3.0, gdb-fax-deadlock.txt, issue_log
>
>
> On an asterisk system with heavy use of AGI and inbound CNG-detected faxing, occasionally all channel activity will freeze. Running 'core show channels' returns nothing, but the logs continue running with anything except channel activity. Running with 'sip set debug on' shows that chan_sip.c doesn't even claim to be reading packets anymore.
> This deadlock was triggered several times daily across our array of asterisk servers, which process hundreds of faxes and tens of thousands of calls daily.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list