[asterisk-bugs] [Asterisk 0009788]: Deadlock problem with agents, queues and libpri (stop accepting incoming calls in PRI lines)

noreply at bugs.digium.com noreply at bugs.digium.com
Mon Sep 17 10:16:14 CDT 2007


A NOTE has been added to this issue. 
====================================================================== 
http://bugs.digium.com/view.php?id=9788 
====================================================================== 
Reported By:                Ted Brown
Assigned To:                
====================================================================== 
Project:                    Asterisk
Issue ID:                   9788
Category:                   Addons/General
Reproducibility:            sometimes
Severity:                   crash
Priority:                   normal
Status:                     new
Asterisk Version:           1.4.10.1  
SVN Branch (only for SVN checkouts, not tarball releases): N/A  
SVN Revision (number only!):  
Disclaimer on File?:        No 
Request Review:              
====================================================================== 
Date Submitted:             05-23-2007 18:18 CDT
Last Modified:              09-17-2007 10:16 CDT
====================================================================== 
Summary:                    Deadlock problem with agents, queues and libpri
(stop accepting incoming calls in PRI lines)
Description: 
I have a Asterisk-based call center deployment with around 40 SIP users,
attending incoming calls from two PRI lines (2xE1) using agents and
queues.

The problem is that Asterisk stops accepting new incoming calls to the PRI
lines without reason, although there should be free channels to make room
for new incoming calls, but Asterisk thinks these channels are being used.
SIP calls can be placed without problems between internal users.

PRI lines shouldn't be the origin of the problem, as an old legacy PBX
works perfectly with the same lines, so the problem seems to be related
with agents or queues.

After the crash, performing an "zap show channels" shows that all channels
are busy, and calls seems that have been queued for a long time in
different queues (and they are not really there - users usually don't wait
90 minutes to be attended while listening to the music on hold).

There is no other services running on the server, CDR is being stored to 
disk and we are not using any kind of AGI's or reporting tools. Currently
the only solution is to reboot the machine, as rebooting Asterisk is not
enough. Using any command on the CLI results in no output at all.

The crash is not easily reproduceable, as it doesn't follow a clear
pattern. Asterisk just seem to get blocked when it manages around 30-40
calls in the queues. During last week, we had 2-3 crashed each day.

Based on users lists mails, it seems that other users have had a similar
problem within the same scenario, at least with 1.2.x. More precisely, we
have observed the same problem in bug ID 0006147, but it has been closed
without a clear answer.

Hardware and software specs:

 Platform: Suse Linux Enterprise Server 10
 Machine: IBM xSeries 226, 1 GB RAM, Intel CPU
 PRI card: Digium TE212 with echo cancellation module
 Asterisk version: 1.2.18

Follows a list of the most relevant messages before and after the crash:

DEBUG[28519] chan_sip.c: Stopping retransmission on
'NzNmZWM0ZDc0OTYyNWI5YWM2ZTBhZjY3NDM4N2RjNmQ.' of Response 12: Match Found 
(lots of messages like that)

DEBUG[28511] chan_zap.c: Ring requested on channel 0/13 already in use or
previously requested on span 1.  Attempting to renegotiating channel.

DEBUG[28511] chan_zap.c: Found empty available channel 0/9

DEBUG[29939] app_dial.c: Exiting with DIALSTATUS=CONGESTION.

I would very appreciate any help on this. I can provide backtrace if
needed.

Best regards,
====================================================================== 

---------------------------------------------------------------------- 
 Ted Brown - 09-17-07 10:16  
---------------------------------------------------------------------- 
Russel,

thanks for pointing out this. Just in case it helps, I've added several
files with a more complete backtraces with the result of the following
commands at GDB:

  - bt full
  - info stack
  - thread apply all
  - several values of relevant variables (bridged, bridged->tech, etc...)

The process to force the segfault are as follows:

- New call with caller-id "302" to queue "400"
- Call passed to agent 402
- Agent 402 takes a new line in his softphone and starts a new call the
number of another queue (queue 100 in our example)
- Agent 103 takes the previous call
- Agent 402 transfer the call from "302" to agent 103

After that, Asterisk will crash on a "show channels" or when processing an
INVITE. The examples provided here are related to a "show channel" command
in the CLI. This time, Asterisk crashes at line 3091 of channel.c, instead
of 3900:

[line 3900]  if (bridged && bridged->tech &&
bridged->tech->bridged_channel)
[line 3901]     bridged = bridged->tech->bridged_channel(chan, bridged);
[line 3902]            return bridged;

(gdb) print bridged
$1 = (struct ast_channel *) 0x2aaab53c13c4

(gdb) print bridged->tech
$2 = (const struct ast_channel_tech *) 0x532f04

(gdb) print bridged->tech->bridged_channel
$3 = (struct ast_channel *(* const)(struct ast_channel *, struct
ast_channel *)) 0x614c202e79726576

(gdb) print chan
$4 = (struct ast_channel *) 0x2aaab53becc4

If I can provide further information or debugging which can be of help, do
not hesitate to contact me. 

Issue History 
Date Modified   Username       Field                    Change               
====================================================================== 
09-17-07 10:16  Ted Brown      Note Added: 0070671                          
======================================================================




More information about the asterisk-bugs mailing list