[asterisk-bugs] [Asterisk 0016936]: [patch] Qualify frequency has big pauses. Asterisk stops sending SIP OPTIONS to keep NAT alive

Asterisk Bug Tracker noreply at bugs.digium.com
Thu May 6 08:58:09 CDT 2010


A NOTE has been added to this issue. 
====================================================================== 
https://issues.asterisk.org/view.php?id=16936 
====================================================================== 
Reported By:                ib2
Assigned To:                russell
====================================================================== 
Project:                    Asterisk
Issue ID:                   16936
Category:                   Channels/chan_sip/General
Reproducibility:            sometimes
Severity:                   major
Priority:                   normal
Status:                     assigned
Asterisk Version:           1.6.2.4 
JIRA:                       SWP-993 
Regression:                 No 
Reviewboard Link:            
SVN Branch (only for SVN checkouts, not tarball releases): N/A 
SVN Revision (number only!):  
Request Review:              
====================================================================== 
Date Submitted:             2010-03-01 13:54 CST
Last Modified:              2010-05-06 08:58 CDT
====================================================================== 
Summary:                    [patch] Qualify frequency has big pauses. Asterisk
stops sending SIP OPTIONS to keep NAT alive
Description: 
We have several SIP phone peers that that becomes UNREACHABLE since
upgrading to Asterisk 1.6.2.x

[10:08:44] chan_sip.c: Peer '202_117' is now UNREACHABLE!  Last qualify:
100
[10:11:25] chan_sip.c: Peer '202_117' is now Reachable. (86ms / 2000ms)
[11:59:03] chan_sip.c: Peer '202_117' is now UNREACHABLE!  Last qualify:
91
[12:11:27] chan_sip.c: Peer '202_117' is now Reachable. (85ms / 2000ms)
[13:17:21] chan_sip.c: Peer '202_117' is now UNREACHABLE!  Last qualify:
90
[13:41:27] chan_sip.c: Peer '202_117' is now Reachable. (92ms / 2000ms)

The phone is UNREACHABLE until it registers again. The phone does not know
that it is UNREACHABLE.
Asterisk reports the phone as UNREACHABLE after a big pause in sending SIP
OPTIONS to keep NAT alive. Therefore NAT table is lost and asterisk cannot
receive SIP OK reply from the phone.

The typical interval between the occurrence is shown above
======================================================================
Relationships       ID      Summary
----------------------------------------------------------------------
related to          0017277 [patch] The heap data structure can't c...
====================================================================== 

---------------------------------------------------------------------- 
 (0121465) svnbot (reporter) - 2010-05-06 08:58
 https://issues.asterisk.org/view.php?id=16936#c121465 
---------------------------------------------------------------------- 
Repository: asterisk
Revision: 261496

U   trunk/main/heap.c

------------------------------------------------------------------------
r261496 | russell | 2010-05-06 08:58:07 -0500 (Thu, 06 May 2010) | 40
lines

Fix handling of removing nodes from the middle of a heap.

This bug surfaced in 1.6.2 and does not affect code in any other released
version of Asterisk.  It manifested itself as SIP qualify not happening
when
it should, causing peers to go unreachable.  This was debugged down to
scheduler
entries sometimes not getting executed when they were supposed to, which
was in
turn caused by an error in the heap code.

The problem only sometimes occurs, and it is due to the logic for removing
an entry
in the heap from an arbitrary location (not just popping off the top). 
The scheduler
performs this operation frequently when entries are removed before they
run (when
ast_sched_del() is used).

In a normal pop off of the top of the heap, a node is taken off the
bottom,
placed at the top, and then bubbled down until the max heap property is
restored
(see max_heapify()).  This same logic was used for removing an arbitrary
node
from the middle of the heap.  Unfortunately, that logic is full of fail. 
This
patch fixes that by fully restoring the max heap property when a node is
thrown
into the middle of the heap.  Instead of just pushing it down as
appropriate, it
first pushes it up as high as it will go, and _then_ pushes it down.

Lastly, fix a minor problem in ast_heap_verify(), which is only used for
debugging.  If a parent and child node have the same value, that is not an
error.  The only error is if a parent's value is less than its children.

A huge thanks goes out to cappucinoking for debugging this down to the
scheduler,
and then producing an ast_heap test case that demonstrated the breakage. 
That
made it very easy for me to focus on the heap logic and produce a fix. 
Open source
projects are awesome.

(closes issue https://issues.asterisk.org/view.php?id=16936)
Reported by: ib2
Tested by: cappucinoking, crjw

(closes issue https://issues.asterisk.org/view.php?id=17277)
Reported by: cappucinoking
Patches:
      heap-fix.rev2.diff uploaded by russell (license 2)
Tested by: cappucinoking, russell

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=261496 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2010-05-06 08:58 svnbot         Checkin                                      
2010-05-06 08:58 svnbot         Note Added: 0121465                          
======================================================================




More information about the asterisk-bugs mailing list