[asterisk-bugs] [Asterisk 0010199]: Random replacement of channel name with other text in queue log entries
noreply at bugs.digium.com
noreply at bugs.digium.com
Mon Aug 13 09:48:42 CDT 2007
A NOTE has been added to this issue.
======================================================================
http://bugs.digium.com/view.php?id=10199
======================================================================
Reported By: jfitzgibbon
Assigned To: putnopvut
======================================================================
Project: Asterisk
Issue ID: 10199
Category: Applications/app_queue
Reproducibility: random
Severity: major
Priority: normal
Status: feedback
Asterisk Version: 1.4.7.1
SVN Branch (only for SVN checkouts, not tarball releases): N/A
SVN Revision (number only!):
Disclaimer on File?: No
Request Review:
======================================================================
Date Submitted: 07-13-2007 08:44 CDT
Last Modified: 08-13-2007 09:48 CDT
======================================================================
Summary: Random replacement of channel name with other text
in queue log entries
Description:
On 1.4.5, 1.4.6 and 1.4.7.1, I have observed random replacement of the
channel name with other text in my queue log. Typically the other text is
part of a manager event, suggesting that a pointer is getting corrupted
somewhere. It happens on a very small percentage of calls, but there are
consistent elements to the corruption from my observations:
- it only happens on one of my eleven queues. Of the eleven, four others
are configured almost identically to the queue on which the corruption is
observed (the only differences are queue name and wrapuptime length). The
problem queue is listed second in queues.conf, and does not sort lexically
to the top or bottom of a list of queue names.
- the corruption first appears on the CONNECT event
- the channel name remains corrupted for the duration of the call (i.e.
you never see a good ENTERQUEUE, bad CONNECT, then a good COMPLETECALLER)
- the replacement text is not consistent. It can differ between the
CONNECT and TRANSFER events (see example 1)
- if the call is transferred to an extension which enqueues the caller to
another queue, the corruption is cleared, and does not typically re-appear
(see example 1)
I cannot reproduce the bug on demand in my lab environment. On a call
center fielding about 5000 calls per day, I see this on 5-10 of those
calls. The main impact of this is that queue analysis programs never see
the CONNECT or COMPLETEXXXXXX events, so they think that any corrupted call
is waiting forever. Having such a large wait time for several calls on a
queue knocks statistics out of whack.
======================================================================
----------------------------------------------------------------------
putnopvut - 08-13-07 09:48
----------------------------------------------------------------------
Sorry for the lack of activity on this issue. The problem is that it's easy
enough to see where corrupt memory is being accessed, but it's hard to see
exactly why it's being corrupted. My previous fix (adding locking for queue
member operations) seems to be the right way to go, since it would insure
that a member's information could not change unless the lock were held. The
only problem is that like you discovered, the patch itself has issues. And
the problem with the patch is that aside from one easily fixed problem,
it's not obvious what's wrong with the patch either. This will be addressed
in due time.
Issue History
Date Modified Username Field Change
======================================================================
08-13-07 09:48 putnopvut Note Added: 0068769
======================================================================
More information about the asterisk-bugs
mailing list