[asterisk-bugs] [Asterisk 0015314]: Seg fault in chan_local - local_pvt_destroy

Asterisk Bug Tracker noreply at bugs.digium.com
Thu Aug 13 11:30:09 CDT 2009


A NOTE has been added to this issue. 
====================================================================== 
https://issues.asterisk.org/view.php?id=15314 
====================================================================== 
Reported By:                sroberts
Assigned To:                
====================================================================== 
Project:                    Asterisk
Issue ID:                   15314
Category:                   Channels/chan_local
Reproducibility:            unable to reproduce
Severity:                   crash
Priority:                   normal
Status:                     new
Target Version:             1.4.28
Asterisk Version:           1.4.22 
Regression:                 No 
SVN Branch (only for SVN checkouts, not tarball releases): N/A 
SVN Revision (number only!):  
Request Review:              
====================================================================== 
Date Submitted:             2009-06-11 04:50 CDT
Last Modified:              2009-08-13 11:30 CDT
====================================================================== 
Summary:                    Seg fault in chan_local - local_pvt_destroy
Description: 
This is the same issue as 14780. We also use SNOM phones (300s and 320s)
however these extensions are not connected to the server on which Asterisk
crashed. The crash occurred on the queue server.

A backtrace of the crash has been attached.

The crash here occurred when callfile finished execution. We use callfiles
to pause/unpause the agents. The local_pvt being freed is not null:

(gdb) frame 2
https://issues.asterisk.org/view.php?id=2  0x002dd837 in local_pvt_destroy
(pvt=0xa23e928) at chan_local.c:159
159             free(pvt);
(gdb) p pvt
$1 = (struct local_pvt *) 0xa23e928
(gdb) p *pvt
$2 = {lock = {mutex = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0,
__m_kind = 1, __m_lock = {__status = 0, __spinlock = 0}}, track = 1, file =
{0x2e0f08 "chan_local.c", 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
lineno = {158, 0, 0, 0, 0, 0, 0, 0, 0, 0}, reentrancy = 0, func = {0x2e0fbe
"local_pvt_destroy", 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, thread =
{0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, reentr_mutex = {__m_reserved = 0, __m_count
= 0, __m_owner = 0x0, __m_kind = 1, __m_lock = {__status = 0, __spinlock =
0}}}, flags = 16, context = "vital-out", '\0' <repeats 70 times>, exten =
"*\000vital-out\000n", '\0' <repeats 66 times>, reqformat = 64, owner =
0x0, chan = 0x0, u_owner = 0xa2234c8, u_chan = 0xa1c3500, list = {next =
0x0}}


Due to the fact that our queue servers are so busy I cannot simply upgrade
it to a newer version (this one in particular handles around 10000 calls
per day) unless I know a version is stable. I've tested 1.4.25 and it
proved horrendously unstable (deadlocks and seg faults).


======================================================================
Relationships       ID      Summary
----------------------------------------------------------------------
has duplicate       0014780 Asterisk abort (signal 6) in local_pvt_...
====================================================================== 

---------------------------------------------------------------------- 
 (0109008) davidw (reporter) - 2009-08-13 11:30
 https://issues.asterisk.org/view.php?id=15314#c109008 
---------------------------------------------------------------------- 
We have a variation on this in 1.6.1.0.  The actual stack trace goes
through dial_exec_full, rather than the route this one takes.  It is
reproducible, but not reliably so (we've not been able to reproduce it on
the development systems, but it was reasonably re-producable on the load
test rig).

We have tried the https://issues.asterisk.org/view.php?id=14780 patch.  This
seems to reduce the window, but not
remove it, making it even more difficult to reproduce - it originally only
showed on a real machine, whereas most testing is done on VMWare guests. 
In this case, the private structure has already been invalidated before the
first lock attempt in hangup_local, so that call produces an invalid
argument error from pthread_mutex_lock (we have lock debugging compiled
in).  I breakpointed the call to log the locking error and confirmed that
owner was already 0 in the private structure.

Note that this means that this can no longer be considered duplicate, as
the final crash backtrace is indistinguishable whether or not the lock
failure has already occured.

At the moment our priorities are to put in defensive coding to reduce but
not remove the window, but I'll try and come back and provide the traces
and see if I can audit local channel's locking. 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2009-08-13 11:30 davidw         Note Added: 0109008                          
======================================================================




More information about the asterisk-bugs mailing list