[asterisk-dev] [Code Review] Do not use FILE handles when doing SIP TCP reads

Mark Michelson reviewboard at asterisk.org
Mon Oct 8 15:48:21 CDT 2012


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/2123/
-----------------------------------------------------------

(Updated Oct. 8, 2012, 3:48 p.m.)


Review request for Asterisk Developers.


Changes
-------

This should take care of Walter's final scenario.

May this issue die a horrible death.


Summary
-------

The reporter of issue ASTERISK-20213 had an issue where Asterisk would lock up after being used for a few days. When looking at backtraces, it was apparent that the problematic thread was the SIP TCP thread. It was blocked in a call to fgets(). This blocked thread was holding a lock that the SIP monitor thread was trying to lock. Once the SIP monitor thread was stuck trying to grab the lock, it meant that no SIP traffic could be received.

While the reason why the fgets() call blocked was not ever made explicitly clear, it certainly seemed odd that a successful poll() would result in an fgets() that would block forever. The obvious oddness was that we were polling on a file descriptor but then trying to read from a corresponding FILE handle. This, in the general opinion of everyone, is "stupid". I supplied a patch to the reporter that uses recv() instead of fgets() for TCP SIP connections, hoping this would work.

As it turns out, the patch has been in use for over three weeks with no issues, so it appears to be a good fix. The patch specifically targets TCP connections and not TLS. TLS connections were not reported as having the issue, plus changing TLS would be a much more invasive operation.

In my opinion, we should remove the use of FILE handles altogether in the TCP/TLS code, but such a task would be better suited for Asterisk trunk instead of a released version. For now, fixing the problems as they are reported is the best option.

Note that the reporter reported his issue against Asterisk 10 but this review is made against Asterisk 1.8. This is because the same method of retrieving TCP data is used in 1.8 so I believe the issue must exist there as well.

While viewing my changes, pay particular attention to the TLS code to ensure I did not introduce any subtle logic changes. The sip_tls_read() function is pretty much a copy and paste of the code that existed before, so I am hopeful that I have not introduced anything undesirable there.


This addresses bug ASTERISK-20212.
    https://issues.asterisk.org/jira/browse/ASTERISK-20212


Diffs (updated)
-----

  /branches/1.8/channels/chan_sip.c 373848 
  /branches/1.8/include/asterisk/tcptls.h 373847 
  /branches/1.8/main/tcptls.c 373847 

Diff: https://reviewboard.asterisk.org/r/2123/diff


Testing
-------

In the reporter's words:

"we have had the first patch in since my last comment with ZERO failures. I think at this point it is safe to say that fix will work (and is working)."


Thanks,

Mark

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20121008/a9c9b284/attachment.htm>


More information about the asterisk-dev mailing list