[asterisk-bugs] [JIRA] (ASTERISK-18345) sips connection dropped by asterisk with a large INVITE

Alex Khokhlov (JIRA) noreply at issues.asterisk.org
Fri Oct 11 02:47:03 CDT 2013


    [ https://issues.asterisk.org/jira/browse/ASTERISK-18345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=210878#comment-210878 ] 

Alex Khokhlov edited comment on ASTERISK-18345 at 10/11/13 2:45 AM:
--------------------------------------------------------------------

I also have this issue in my system and it is 100% reproducable in the following environment:
Server: CentOS 6.4, OpenSSL 1.0.0-27.el6_4.2, Asterisk 11.5.1
Client: Bria, Android 4.3, Samsung Galaxy Nexus
Ways to reproduce: connect with TLS enabled, enable all codecs in the client, register with the server and try to make a call. 

The connection is closed by the server side because it receives SSL_ERROR_WANT_READ from OpenSSL and then immediately returns -1 from ssl_read(). However, it does not actually signal the problem, it is merely a signal to repeat read (see https://www.openssl.org/docs/ssl/SSL_read.html ).

On the network level that happens because of TCP packet fragmentation. Diagnosing packets using wireshark shows that client sends one big TLS packet that is fragmented into two or more TCP packets. The server receives the first TCP packet and immediately decides to close connection (because of -1 from ssl_read). That happens just before the second packet comes to the server side and OpenSSL is able to process/decode TLS packet.

This is definitely a bug on the Asterisk side.
                
      was (Author: alex-khokhlov):
    I also have this issue in my system and it is 100% reproducable in the following environment:
Server: CentOS 6.4, OpenSSL 1.0.0-27.el6_4.2, Asterisk 11.5
Client: Bria, Android 4.3, Samsung Galaxy Nexus
Ways to reproduce: connect with TLS enabled, enable all codecs in the client, register with the server and try to make a call. 

The connection is closed by the server side because it receives SSL_ERROR_WANT_READ from OpenSSL and then immediately returns -1 from ssl_read(). However, it does not actually signal the problem, it is merely a signal to repeat read (see https://www.openssl.org/docs/ssl/SSL_read.html ).

On the network level that happens because of TCP packet fragmentation. Diagnosing packets using wireshark shows that client sends one big TLS packet that is fragmented into two or more TCP packets. The server receives the first TCP packet and immediately decides to close connection (because of -1 from ssl_read). That happens just before the second packet comes to the server side and OpenSSL is able to process/decode TLS packet.

This is definitely a bug on the Asterisk side.
                  
> sips connection dropped by asterisk with a large INVITE
> -------------------------------------------------------
>
>                 Key: ASTERISK-18345
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-18345
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Channels/chan_sip/TCP-TLS
>    Affects Versions: SVN, 1.8.4, 11.4.0, 11.5.0
>            Reporter: Stephane Chazelas
>         Attachments: tls_read_fix_try1_1.8.11.1.diff, tls_read_fix_try2_1.8.11.1.diff, tls_read_fix_try3_1.8.11.1.diff, tls_read.patch
>
>
> When using jitsi (http://jitsi.org) (debian amd64 one) as sip-tls extension, one can see the SSL connection to asterisk being dropped (abnormally, but that seems due to ASTERISK-18342) during the registration and placing calls don't work.
> I first thought it was a SSL method issue as jitsi doesn't seem to support SSLv3 or TLSv1 and I was able to make it work by using a MitM that proxied the connection through socat: jitsi was able to talk to socat OK and socat to asterisk OK.
> But it looks more like a timing/undeterministic issue. I then had a look at the code, added a little logging and found out that the connection was closed because of fgets() returning NULL in _sip_tcp_helper_thread().
> I then added logging to ssl_read() to see if SSL_read() ever failed, but it doesn't so I don't understand how that fgets could return eof/error. In that case. Then, I had a hard time understanding that business of need_poll/after_poll.
> If I understand correctly, tcptls_session->fd is the network socket that carries the encrypted data and other ssl out-of-band stuff and has been made non-blocking, and tcptls_session->f which is a funopen(tcptls_session->ssl, ssl_read, ssl_write, NULL, ssl_close) (or fopencookie Linux equivalent). polls are made on the fd before doing fgets that eventually call SSL_read. That sounds to me like a recipe for catastrophy, deadlocks and the like but I have to admit I have not understood/seen the design fully.
> I still don't get how fgets() can return NULL here but I tried to bring the need_poll/after_poll trick further by doing:
> {code}
> @@ -2659,7 +2637,7 @@ static void *_sip_tcp_helper_thread(stru
>                                  * TLS layer */
>                                 if (!tcptls_session->ssl || need_poll) {
>                                         need_poll = 0;
> -                                       after_poll = 1;
> +                                       after_poll++;
>                                         res = ast_wait_for_input(tcptls_session->fd, timeout);
>                                         if (res < 0) {
>                                                 ast_debug(2, "SIP TCP server :: ast_wait_for_input returned %d\n", res);
> @@ -2674,7 +2654,7 @@ static void *_sip_tcp_helper_thread(stru
>                                 ast_mutex_lock(&tcptls_session->lock);
>                                 if (!fgets(buf, sizeof(buf), tcptls_session->f)) {
>                                         ast_mutex_unlock(&tcptls_session->lock);
> -                                       if (after_poll) {
> +                                       if (after_poll > 1) {
>                                                 goto cleanup;
>                                         } else {
>                                                 need_poll = 1;
> {code}
> and it fixed the issue.
> So, there's something definitely wrong though I couldn't tell exactly what.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.asterisk.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



More information about the asterisk-bugs mailing list