[asterisk-bugs] [JIRA] (ASTERISK-18345) [patch] sips connection dropped by asterisk with a large INVITE

Matt Jordan (JIRA) noreply at issues.asterisk.org
Tue Jul 29 12:31:59 CDT 2014


    [ https://issues.asterisk.org/jira/browse/ASTERISK-18345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=221041#comment-221041 ] 

Matt Jordan commented on ASTERISK-18345:
----------------------------------------

It did go up for review:

https://reviewboard.asterisk.org/r/3653/

Unfortunately, this patch does not fix the root cause of the problem. As mentioned on that review, there is an appropriate way to fix this that would fix it for good, instead of merely making the problem highly unlikely to occur.

It would be great if a patch was put together that fixes the root cause of the problem as recommended on that code review.

> [patch] sips connection dropped by asterisk with a large INVITE
> ---------------------------------------------------------------
>
>                 Key: ASTERISK-18345
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-18345
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Channels/chan_sip/TCP-TLS
>    Affects Versions: SVN, 1.8.4, 11.4.0, 11.5.0
>            Reporter: Stephane Chazelas
>         Attachments: tlsBigSDPdebug.patch, tlsBigSDP.patch, tls_read_fix_try1_1.8.11.1.diff, tls_read_fix_try2_1.8.11.1.diff, tls_read_fix_try3_1.8.11.1.diff, tls_read.patch
>
>
> When using jitsi (http://jitsi.org) (debian amd64 one) as sip-tls extension, one can see the SSL connection to asterisk being dropped (abnormally, but that seems due to ASTERISK-18342) during the registration and placing calls don't work.
> I first thought it was a SSL method issue as jitsi doesn't seem to support SSLv3 or TLSv1 and I was able to make it work by using a MitM that proxied the connection through socat: jitsi was able to talk to socat OK and socat to asterisk OK.
> But it looks more like a timing/undeterministic issue. I then had a look at the code, added a little logging and found out that the connection was closed because of fgets() returning NULL in _sip_tcp_helper_thread().
> I then added logging to ssl_read() to see if SSL_read() ever failed, but it doesn't so I don't understand how that fgets could return eof/error. In that case. Then, I had a hard time understanding that business of need_poll/after_poll.
> If I understand correctly, tcptls_session->fd is the network socket that carries the encrypted data and other ssl out-of-band stuff and has been made non-blocking, and tcptls_session->f which is a funopen(tcptls_session->ssl, ssl_read, ssl_write, NULL, ssl_close) (or fopencookie Linux equivalent). polls are made on the fd before doing fgets that eventually call SSL_read. That sounds to me like a recipe for catastrophy, deadlocks and the like but I have to admit I have not understood/seen the design fully.
> I still don't get how fgets() can return NULL here but I tried to bring the need_poll/after_poll trick further by doing:
> {code}
> @@ -2659,7 +2637,7 @@ static void *_sip_tcp_helper_thread(stru
>                                  * TLS layer */
>                                 if (!tcptls_session->ssl || need_poll) {
>                                         need_poll = 0;
> -                                       after_poll = 1;
> +                                       after_poll++;
>                                         res = ast_wait_for_input(tcptls_session->fd, timeout);
>                                         if (res < 0) {
>                                                 ast_debug(2, "SIP TCP server :: ast_wait_for_input returned %d\n", res);
> @@ -2674,7 +2654,7 @@ static void *_sip_tcp_helper_thread(stru
>                                 ast_mutex_lock(&tcptls_session->lock);
>                                 if (!fgets(buf, sizeof(buf), tcptls_session->f)) {
>                                         ast_mutex_unlock(&tcptls_session->lock);
> -                                       if (after_poll) {
> +                                       if (after_poll > 1) {
>                                                 goto cleanup;
>                                         } else {
>                                                 need_poll = 1;
> {code}
> and it fixed the issue.
> So, there's something definitely wrong though I couldn't tell exactly what.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list