[asterisk-bugs] [Asterisk 0012101]: SIP channel hung due to CANCEL ReliableXmit (ReTX)

noreply at bugs.digium.com noreply at bugs.digium.com
Fri May 30 15:23:34 CDT 2008


A NOTE has been added to this issue. 
====================================================================== 
http://bugs.digium.com/view.php?id=12101 
====================================================================== 
Reported By:                MVF
Assigned To:                
====================================================================== 
Project:                    Asterisk
Issue ID:                   12101
Category:                   Channels/chan_sip/General
Reproducibility:            always
Severity:                   minor
Priority:                   normal
Status:                     confirmed
Asterisk Version:           1.4.18 
SVN Branch (only for SVN checkouts, not tarball releases): N/A 
SVN Revision (number only!):  
Disclaimer on File?:        N/A 
Request Review:              
====================================================================== 
Date Submitted:             02-28-2008 13:23 CST
Last Modified:              05-30-2008 15:23 CDT
====================================================================== 
Summary:                    SIP channel hung due to CANCEL ReliableXmit (ReTX)
Description: 
I've detected a problem that make asterisk keep sip channels up forever
even when the call has been disconnected, the problem keep UDP ports and
memory taken making asterisk to drop new calls when the sip channels hung
increase.

Reviewing the "sip show history" of one of these hung channels I see that
the call is cancel by calling party while is on ringing state, after that
asterisk send CANCEL to the peer but it didn't get an answer (maybe because
of network problems). After I get some bad responses from peer, maybe also
due to network problems that asterisk reply with an ACK but the history
shows a new ReliableXmit timeout message every 30 seconds.

(originating peer)
  * SIP Call
1. TxReqRel        INVITE / 102 INVITE - -UNKNOWN-
2. Rx              SIP/2.0 / 102 INVITE / 100 Trying
3. Rx              SIP/2.0 / 102 INVITE / 100 Trying
4. Rx              SIP/2.0 / 102 INVITE / 180 Ringing
5. Rx              SIP/2.0 / 102 INVITE / 180 Ringing
6. Cancel          Cause Normal Clearing
7. SchedDestroy    32000 ms
8. TxReqRel        CANCEL / 102 CANCEL - -UNKNOWN-
9. SchedDestroy    32000 ms
10. ReTx            1000 CANCEL sip:17898702397 at 20.20.20.25 SIP/2.0
11. ReTx            2000 CANCEL sip:17898702397 at 20.20.20.25 SIP/2.0
12. Rx              SIP/2.0 / 102 CANCEL / 100 Trying
13. ReliableXmit    timeout
14. Rx              SIP/2.0 / 102 INVITE / 180 Ringing
15. Rx              SIP/2.0 / 102 INVITE / 486 Busy Here
16. TxReq           ACK / 102 ACK - -UNKNOWN-
17. ReliableXmit    timeout
18. ReliableXmit    timeout
19. ReliableXmit    timeout
20. ReliableXmit    timeout
21. ReliableXmit    timeout
22. ReliableXmit    timeout
23. ReliableXmit    timeout
24. ReliableXmit    timeout
25. ReliableXmit    timeout
26. ReliableXmit    timeout
27. ReliableXmit    timeout
28. ReliableXmit    timeout
29. ReliableXmit    timeout
30. ReliableXmit    timeout
31. ReliableXmit    timeout
32. ReliableXmit    timeout
33. ReliableXmit    timeout
34. ReliableXmit    timeout
35. ReliableXmit    timeout
36. ReliableXmit    timeout
37. ReliableXmit    timeout
38. ReliableXmit    timeout
39. ReliableXmit    timeout
40. ReliableXmit    timeout
41. ReliableXmit    timeout
42. ReliableXmit    timeout
43. ReliableXmit    timeout
44. ReliableXmit    timeout
45. ReliableXmit    timeout
46. ReliableXmit    timeout
47. ReliableXmit    timeout
48. ReliableXmit    timeout
49. ReliableXmit    timeout
50. ReliableXmit    timeout

I got a capture for the sip transaction messages that generate these hung
sip channels, please take a look at it in the attached file including two
captures (20080227_sip_channel_hung_at_cancel_ReTX.txt), note that there is
no sip messages sent after the last ACK, the "ReliableXmit timeout"
messages appear in the history of the call filling the 50 slot buffer and
even deleting the old history messages. The "sip show channels" command
shows the channel hung forever.

20.20.20.25   1789870239  185d051b629  00102/00000  0x0 (nothing)    No 
(d)  Tx: ACK
====================================================================== 

---------------------------------------------------------------------- 
 aragon - 05-30-08 15:23  
---------------------------------------------------------------------- 
Possibly related to bug 12603 12584 ? But this is fixed in 1.4.20 official
release
Give 1.4.20 a try

------------------------------------------------------------------------
r116039 | russell | 2008-05-13 16:14:28 -0500 (Tue, 13 May 2008) | 32
lines

Merged revisions 116038 via svnmerge from
https://origsvn.digium.com/svn/asterisk/branches/1.4 [^]

........
r116038 | russell | 2008-05-13 16:17:23 -0500 (Tue, 13 May 2008) | 24
lines

Fix a deadlock involving channel autoservice and chan_local that was
debugged
and fixed by mmichelson and me.

We observed a system that had a bunch of threads stuck in
ast_autoservice_stop().
The reason these threads were waiting around is because this function
waits to
ensure that the channel list in the autoservice thread gets rebuilt before
the
stop() function returns. However, the autoservice thread was also locked,
so
the autoservice channel list was never getting rebuilt.

The autoservice thread was stuck waiting for the channel lock on a local
channel.
However, the local channel was locked by a thread that was stuck in the
autoservice
stop function.

It turned out that the issue came down to the local_queue_frame() function
in
chan_local. This function assumed that one of the channels passed in as
an
argument was locked when called. However, that was not always the case.
There
were multiple cases in which this channel was not locked when the function
was
called. We fixed up chan_local to indicate to this function whether this
channel
was locked or not. The previous assumption had caused local_queue_frame()
to
improperly return with the channel locked, where it would then never get
unlocked.

(closes issue 0012584)
(related to issue 0012603) 

Issue History 
Date Modified   Username       Field                    Change               
====================================================================== 
05-30-08 15:23  aragon         Note Added: 0087576                          
======================================================================




More information about the asterisk-bugs mailing list