[asterisk-dev] [Code Review]: Do not use FILE handles when doing SIP TCP reads

Mark Michelson reviewboard at asterisk.org
Tue Sep 25 13:25:03 CDT 2012



> On Sept. 23, 2012, 4:48 p.m., wdoekes wrote:
> > > pay particular attention to the TLS code to ensure I did not introduce any subtle logic changes
> > 
> > I think you got this covered.
> > 
> > > In my opinion, we should remove the use of FILE handles altogether in the TCP/TLS code
> > 
> > I don't see why the tls code couldn't be changed at one fell swoop? It'd be equally invasive as the tcp bit, no?
> > 
> > But, I'm not sure that tossing the Content-Length checks is at all safe. Subsequent messages being packed together in a single read will become message bodies by accident, unless checks are added at the end of sip_tcp_read().

What I had meant was to change the code in main/tcptls.c to not use fopencookie() at all. This would have wide-ranging effects since manager and http use the tcptls API. More properly, the API should be changed to be a bit more encapsulated so that details such as file descriptors aren't even known by code in chan_sip or manager. That's why it's a lot more of a wide-sweeping change than this one.

I've added content-length checking to the TCP check.


> On Sept. 23, 2012, 4:48 p.m., wdoekes wrote:
> > /branches/1.8/channels/chan_sip.c, line 2521
> > <https://reviewboard.asterisk.org/r/2123/diff/2/?file=31342#file31342line2521>
> >
> >     'content_length', instead of 'cl' would be nice.

changed


> On Sept. 23, 2012, 4:48 p.m., wdoekes wrote:
> > /branches/1.8/channels/chan_sip.c, line 2545
> > <https://reviewboard.asterisk.org/r/2123/diff/2/?file=31342#file31342line2545>
> >
> >     Remove the "!tcptls_session->ssl", it's always false since we only got here if ssl was true.

Changed here and everywhere else the 'ssl' field would change operations in sip_tls_read()


> On Sept. 23, 2012, 4:48 p.m., wdoekes wrote:
> > /branches/1.8/channels/chan_sip.c, line 2644
> > <https://reviewboard.asterisk.org/r/2123/diff/2/?file=31342#file31342line2644>
> >
> >     I presume you were aiming for 65535? ;)

Yes....yes I was :)
Fixed.


> On Sept. 23, 2012, 4:48 p.m., wdoekes wrote:
> > /branches/1.8/channels/chan_sip.c, line 2647
> > <https://reviewboard.asterisk.org/r/2123/diff/2/?file=31342#file31342line2647>
> >
> >     If ast_wait_for_input returns <= 0, doing a recv() seems superfluous. I'm not sure if the if-checks below are below the recv() by accident, or that they're the recv() checks.
> >     
> >     In the latter case, return value 0 would mean "an orderly shutdown". In all cases the current debug messages are wrong.

Moved the return value checks up to after ast_wait_for_input(). Added new debug messages for return value checks after recv().


> On Sept. 23, 2012, 4:48 p.m., wdoekes wrote:
> > /branches/1.8/channels/chan_sip.c, line 2649
> > <https://reviewboard.asterisk.org/r/2123/diff/2/?file=31342#file31342line2649>
> >
> >     Unnecessary parens around the (sizeof(readbuf)).
> >     
> >     But... more interestingly.. who is terminating readbuf? Some readbuf[res]=0 might be in order.

Added explicit null-termination and removed parens.


> On Sept. 23, 2012, 4:48 p.m., wdoekes wrote:
> > /branches/1.8/channels/chan_sip.c, lines 2805-2806
> > <https://reviewboard.asterisk.org/r/2123/diff/2/?file=31342#file31342line2805>
> >
> >     This now reads:
> >     
> >     if (ssl) {
> >         if (read_ssl() < 0) {
> >             goto cleanup;
> >         }
> >     } else if (read_tcp() < 0) {
> >         goto cleanup;
> >     }
> >     
> >     whereas this might be prettier:
> >     
> >     if (ssl) {
> >         res = read_ssl();
> >     } else {
> >         res = read_tcp();
> >     }
> >     if (res < 0) {
> >         goto cleanup;
> >     }

Changed.


> On Sept. 23, 2012, 4:48 p.m., wdoekes wrote:
> > /branches/1.8/channels/chan_sip.c, lines 2807-2808
> > <https://reviewboard.asterisk.org/r/2123/diff/2/?file=31342#file31342line2807>
> >
> >     Making this if(ssl) only is a regression, I think.

The check about authenticated still occurs at the beginning of the for loop in _sip_tcp_helper_thread(), so it can still occur for TCP sockets.

What I was seeing was that this was checked as data was read on the socket when using a FILE handle. When using recv() directly, there are no incremental places where I can do such a check.
I've added it after ast_wait_for_input() just to be safe, though. With the content-length checking stuff, it could actually be triggered.


- Mark


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/2123/#review7142
-----------------------------------------------------------


On Sept. 20, 2012, 2:41 p.m., Mark Michelson wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/2123/
> -----------------------------------------------------------
> 
> (Updated Sept. 20, 2012, 2:41 p.m.)
> 
> 
> Review request for Asterisk Developers.
> 
> 
> Summary
> -------
> 
> The reporter of issue ASTERISK-20213 had an issue where Asterisk would lock up after being used for a few days. When looking at backtraces, it was apparent that the problematic thread was the SIP TCP thread. It was blocked in a call to fgets(). This blocked thread was holding a lock that the SIP monitor thread was trying to lock. Once the SIP monitor thread was stuck trying to grab the lock, it meant that no SIP traffic could be received.
> 
> While the reason why the fgets() call blocked was not ever made explicitly clear, it certainly seemed odd that a successful poll() would result in an fgets() that would block forever. The obvious oddness was that we were polling on a file descriptor but then trying to read from a corresponding FILE handle. This, in the general opinion of everyone, is "stupid". I supplied a patch to the reporter that uses recv() instead of fgets() for TCP SIP connections, hoping this would work.
> 
> As it turns out, the patch has been in use for over three weeks with no issues, so it appears to be a good fix. The patch specifically targets TCP connections and not TLS. TLS connections were not reported as having the issue, plus changing TLS would be a much more invasive operation.
> 
> In my opinion, we should remove the use of FILE handles altogether in the TCP/TLS code, but such a task would be better suited for Asterisk trunk instead of a released version. For now, fixing the problems as they are reported is the best option.
> 
> Note that the reporter reported his issue against Asterisk 10 but this review is made against Asterisk 1.8. This is because the same method of retrieving TCP data is used in 1.8 so I believe the issue must exist there as well.
> 
> While viewing my changes, pay particular attention to the TLS code to ensure I did not introduce any subtle logic changes. The sip_tls_read() function is pretty much a copy and paste of the code that existed before, so I am hopeful that I have not introduced anything undesirable there.
> 
> 
> This addresses bug ASTERISK-20212.
>     https://issues.asterisk.org/jira/browse/ASTERISK-20212
> 
> 
> Diffs
> -----
> 
>   /branches/1.8/channels/chan_sip.c 372950 
> 
> Diff: https://reviewboard.asterisk.org/r/2123/diff
> 
> 
> Testing
> -------
> 
> In the reporter's words:
> 
> "we have had the first patch in since my last comment with ZERO failures. I think at this point it is safe to say that fix will work (and is working)."
> 
> 
> Thanks,
> 
> Mark
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20120925/8d40e877/attachment-0001.htm>


More information about the asterisk-dev mailing list