[asterisk-bugs] [JIRA] (ASTERISK-18345) [patch] sips connection dropped by asterisk with a large INVITE

Elazar Broad (JIRA) noreply at issues.asterisk.org
Thu Jul 31 11:50:56 CDT 2014


     [ https://issues.asterisk.org/jira/browse/ASTERISK-18345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Elazar Broad updated ASTERISK-18345:
------------------------------------

    Attachment: tcptls_pollv2.diff

This version removes the separate while loop for the read and instead continues the main while loop which already has the timeout implemented.

> [patch] sips connection dropped by asterisk with a large INVITE
> ---------------------------------------------------------------
>
>                 Key: ASTERISK-18345
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-18345
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Channels/chan_sip/TCP-TLS
>    Affects Versions: SVN, 1.8.4, 11.4.0, 11.5.0
>            Reporter: Stephane Chazelas
>         Attachments: tcptls_poll.diff, tcptls_pollv2.diff, tlsBigSDPdebug.patch, tlsBigSDP.patch, tls_read_fix_try1_1.8.11.1.diff, tls_read_fix_try2_1.8.11.1.diff, tls_read_fix_try3_1.8.11.1.diff, tls_read.patch
>
>
> When using jitsi (http://jitsi.org) (debian amd64 one) as sip-tls extension, one can see the SSL connection to asterisk being dropped (abnormally, but that seems due to ASTERISK-18342) during the registration and placing calls don't work.
> I first thought it was a SSL method issue as jitsi doesn't seem to support SSLv3 or TLSv1 and I was able to make it work by using a MitM that proxied the connection through socat: jitsi was able to talk to socat OK and socat to asterisk OK.
> But it looks more like a timing/undeterministic issue. I then had a look at the code, added a little logging and found out that the connection was closed because of fgets() returning NULL in _sip_tcp_helper_thread().
> I then added logging to ssl_read() to see if SSL_read() ever failed, but it doesn't so I don't understand how that fgets could return eof/error. In that case. Then, I had a hard time understanding that business of need_poll/after_poll.
> If I understand correctly, tcptls_session->fd is the network socket that carries the encrypted data and other ssl out-of-band stuff and has been made non-blocking, and tcptls_session->f which is a funopen(tcptls_session->ssl, ssl_read, ssl_write, NULL, ssl_close) (or fopencookie Linux equivalent). polls are made on the fd before doing fgets that eventually call SSL_read. That sounds to me like a recipe for catastrophy, deadlocks and the like but I have to admit I have not understood/seen the design fully.
> I still don't get how fgets() can return NULL here but I tried to bring the need_poll/after_poll trick further by doing:
> {code}
> @@ -2659,7 +2637,7 @@ static void *_sip_tcp_helper_thread(stru
>                                  * TLS layer */
>                                 if (!tcptls_session->ssl || need_poll) {
>                                         need_poll = 0;
> -                                       after_poll = 1;
> +                                       after_poll++;
>                                         res = ast_wait_for_input(tcptls_session->fd, timeout);
>                                         if (res < 0) {
>                                                 ast_debug(2, "SIP TCP server :: ast_wait_for_input returned %d\n", res);
> @@ -2674,7 +2654,7 @@ static void *_sip_tcp_helper_thread(stru
>                                 ast_mutex_lock(&tcptls_session->lock);
>                                 if (!fgets(buf, sizeof(buf), tcptls_session->f)) {
>                                         ast_mutex_unlock(&tcptls_session->lock);
> -                                       if (after_poll) {
> +                                       if (after_poll > 1) {
>                                                 goto cleanup;
>                                         } else {
>                                                 need_poll = 1;
> {code}
> and it fixed the issue.
> So, there's something definitely wrong though I couldn't tell exactly what.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list