[asterisk-bugs] [JIRA] (ASTERISK-21207) [patch] - Deadlock on fax extension calling ast_async_goto() with locked channel
Matt Jordan (JIRA)
noreply at issues.asterisk.org
Sun Jul 6 10:19:56 CDT 2014
[ https://issues.asterisk.org/jira/browse/ASTERISK-21207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=220284#comment-220284 ]
Matt Jordan edited comment on ASTERISK-21207 at 7/6/14 10:18 AM:
-----------------------------------------------------------------
A few comments here.
# Please do not post patches in comments. Patches must be attached to an issue marking them as code after signing a license contributor agreement. I've unfortunately had to go back and remove the patches in comments here.
# Generally, the solution to locking issues is not to make them more complex. Typically, you can avoid having to recursively lock, or repeatedly attempt to gain a lock, by fixing the pattern of usage. A good example of this is the locking that is done in {{chan_local}} (now {{core_unreal}}) where the lifetime of an object is bumped such that the object is not destroyed while we attempt to lock it in the correct order. That pattern does not involve recursively locking the object, nor does it involve trylocks.
In the case of {{pbx.c}}, the correct solution would be to make the contexts objects in the core reference counted. This would prevent having to recursively lock the objects as well as try locking them; the lock could be obtained a single time and the reference to the context object bumped. By increasing the reference count on the context object, we would keep it from being destroyed if the dialplan is reloaded (which is really the point of the locks on the contexts object in the first place).
was (Author: mjordan):
A few comments here.
# Please do not post patches in comments. Patches must be attached to an issue marking them as code after signing a license contributor agreement. I've unfortunately had to go back and remove the patches in comments here.
# Generally, the solution to locking issues is not to make them more complex. Typically, you can avoid having to recursively lock, or repeatedly attempt to gain a lock, by fixing the pattern of usage. A good example of this is the locking that is done in {{chan_local}} (now {{core_lock}}) where the lifetime of an object is bumped such that the object is not destroyed while we attempt to lock it in the correct order.
In the case of {{pbx.c}}, the correct solution would be to make the contexts objects in the core reference counted. This would prevent having to recursively lock them; the lock could be obtained a single time and the reference to the context object bumped. By increasing the reference count on the context object, we would keep it from being destroyed if the dialplan is reloaded.
> [patch] - Deadlock on fax extension calling ast_async_goto() with locked channel
> --------------------------------------------------------------------------------
>
> Key: ASTERISK-21207
> URL: https://issues.asterisk.org/jira/browse/ASTERISK-21207
> Project: Asterisk
> Issue Type: Bug
> Security Level: None
> Components: Channels/chan_dahdi, Channels/chan_sip/General
> Affects Versions: 10.7.1, 11.2.1
> Environment: CentOS 6.3 x86_64
> Reporter: Ashley Winters
> Severity: Critical
> Attachments: backtrace-threads.txt, core-show-locks.txt, fax-deadlock.patch, fax-deadlock-v2.patch, fax-deadlock-v2.patch-11.3.0, gdb-fax-deadlock.txt, issue_log
>
>
> On an asterisk system with heavy use of AGI and inbound CNG-detected faxing, occasionally all channel activity will freeze. Running 'core show channels' returns nothing, but the logs continue running with anything except channel activity. Running with 'sip set debug on' shows that chan_sip.c doesn't even claim to be reading packets anymore.
> This deadlock was triggered several times daily across our array of asterisk servers, which process hundreds of faxes and tens of thousands of calls daily.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
More information about the asterisk-bugs
mailing list