[asterisk-bugs] [JIRA] (ASTERISK-24456) SIP deadlock in transfer scenario between Asterisk Servers

Matt Jordan (JIRA) noreply at issues.asterisk.org
Wed Dec 24 09:05:34 CST 2014


    [ https://issues.asterisk.org/jira/browse/ASTERISK-24456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=224216#comment-224216 ] 

Matt Jordan commented on ASTERISK-24456:
----------------------------------------

A few comments, and then a suggestion:

# Your first backtrace ({{backtrace.txt}}) did have {{DEBUG_THREADS}} enabled. Looking at the core dump:
{noformat}
#0  0xb715bcb7 in pthread_mutex_lock () from /lib/i386-linux-gnu/libpthread.so.0
#0  0xb715bcb7 in pthread_mutex_lock () from /lib/i386-linux-gnu/libpthread.so.0
No symbol table info available.
#1  0xb766d504 in pthread_mutex_lock () from /lib/i386-linux-gnu/libc.so.6
No symbol table info available.
#2  0x0813c6c7 in ast_reentrancy_lock (lt=0x6b736972) at /usr/local/src/asterisk-11.14/asterisk-11.14.0-rc1/include/asterisk/lock.h:420
        res = 135518668
{noformat}
The {{ast_reentrancy_lock}} call does not get called unless you have {{DEBUG_THREADS}} enabled:
{code}
nt __ast_pthread_mutex_lock(const char *filename, int lineno, const char *func,
				const char* mutex_name, ast_mutex_t *t)
{
	int res;

#ifdef DEBUG_THREADS
	struct ast_lock_track *lt = NULL;
	int canlog = t->tracking && strcmp(filename, "logger.c");
#ifdef HAVE_BKTR
	struct ast_bt *bt = NULL;
#endif

	if (t->tracking) {
		lt = ast_get_reentrancy(&t->track);
	}

	if (lt) {
#ifdef HAVE_BKTR
		struct ast_bt tmp;

		/* The implementation of backtrace() may have its own locks.
		 * Capture the backtrace outside of the reentrancy lock to
		 * avoid deadlocks. See ASTERISK-22455. */
		ast_bt_get_addresses(&tmp);

		ast_reentrancy_lock(lt);
		if (lt->reentrancy < AST_MAX_REENTRANCY) {
			lt->backtrace[lt->reentrancy] = tmp;
			bt = &lt->backtrace[lt->reentrancy];
		}
		ast_reentrancy_unlock(lt);

		ast_store_lock_info(AST_MUTEX, filename, lineno, func, mutex_name, t, bt);
#else
		ast_store_lock_info(AST_MUTEX, filename, lineno, func, mutex_name, t);
#endif
	}
#endif /* DEBUG_THREADS */
{code}
There are severe problems with {{DEBUG_THREADS}} in that version of Asterisk 11 - so as far as I can tell, this issue only arises when {{DEBUG_THREADS}} is enabled. That's not surprising, since {{DEBUG_THREADS}} causes what will appear to be a deadlock when it is enabled.
# Your second backtrace - {{backtrace-threads.txt}} - has symbols stripped out of it, which makes it hard to tell what is going on. Because of that, it is hard to tell why the locks are still being held. You will need to make sure that all of the symbols are in all of the modules that Asterisk is using.
# The log file simply shows that a call was being processed. The lack of additional {{chan_sip}} messages after some period of time is concerning, but without further explanation as to what you feel should have been happening when, I'd be guessing at what the issue is. Again, however, if you are running with {{DEBUG_THREADS}}, you're sitting on a tick time bomb. It wouldn't shock me if {{chan_sip}} suddenly got locked up with that enabled.
# If this is _not_ actually being caused by {{DEBUG_THREADS}}, _then_, if we cannot reproduce your issue, or you can't provide all of the configuration information to reproduce the issue, the issue is going to be closed. That means narrowing this down to a simple dialplan that reproduces the issue, and/or providing all of the information so that a bug marshal can reproduce the issue. It's pointless to have an issue that a developer can't fix.

So, some suggestions:
# Disable {{DEBUG_THREADS}}:
## It causes problems in your version of Asterisk. Don't run with it.
## Even if it didn't cause spurious problems that mimic a deadlock, it is a performance killer. Running a production server with {{DEBUG_THREADS}} enabled will severely cripple Asterisk's ability to handle calls.
# If you want to run with {{DEBUG_THREADS}}, then, since we have now fixed the issue that {{DEBUG_THREADS}} is causing, please test out the latest from the 11 branch:
{noformat}
svn co http://svn.asterisk.org/svn/asterisk/branches/11
{noformat}
If you continue to see deadlock like symptoms with the latest from the 11 branch, the please do the following:
## Make sure you have {{DONT_OPTIMIZE}} and {{BETTER_BACKTRACES}} enable in menuselect
## Get the output of {{core show locks}} and attach it to this issue
## Get a {{gdb}} backtrace similar to your {{backtrace-threads}} and attach it to this issue. Make sure all the symbols are there.

Thanks!

> SIP deadlock in transfer scenario between Asterisk Servers
> ----------------------------------------------------------
>
>                 Key: ASTERISK-24456
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-24456
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Channels/chan_local, Channels/chan_sip/General
>    Affects Versions: 11.13.0
>         Environment: Ubuntu precise
>            Reporter: Peter Katzmann
>            Assignee: Matt Jordan
>            Severity: Critical
>         Attachments: asterisk-sip-loc, backtrace-threads.txt, backtrace.txt, deadlock.txt, serverchef.txt, serversek.txt, threads.txt
>
>
> We have 3 Asterisk server (a,b and c)
> we have also 3 different user (Caller, Chef, Sek) .
> User Caller on Server a
> User Chef on Server b
> User Sek on Server c
> Now Caller Dials to Chef
> Chef has a call rule via agi to only accept direct calls from Sek, all other calls are transferred to Sek.
> So the call from Caller on server a is tranfered to Sek on server c automatically
> Sek accepts the calls
> Now Sek will do a unattended transfer of caller to Chef
> Now the server b with Chef shows no action on the dialplan anymore and blocks
> Sometimes also crashes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list