[asterisk-bugs] [JIRA] (ASTERISK-18345) [patch] sips connection dropped by asterisk with a large INVITE

Elazar Broad (JIRA) noreply at issues.asterisk.org
Thu Jul 31 01:16:56 CDT 2014


     [ https://issues.asterisk.org/jira/browse/ASTERISK-18345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Elazar Broad updated ASTERISK-18345:
------------------------------------

    Attachment: tcptls_poll.diff

Hi All,
 It's been a while. I have run into this same issue and attached a patch(tcptls_poll.diff) against trunk which starts the work of replacing sip_tls_read/write() and resolves the issue(see the discussion on the previous reviewboard link). My only concern is the while loop which handles the EAGAIN error has the potential to block infinitely. This patch was tested with CSipSimple utilizing TLS(with full SIP headers, multiple codecs and SRTP enabled) and TCP. 

Thanks,
 Elazar 

> [patch] sips connection dropped by asterisk with a large INVITE
> ---------------------------------------------------------------
>
>                 Key: ASTERISK-18345
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-18345
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Channels/chan_sip/TCP-TLS
>    Affects Versions: SVN, 1.8.4, 11.4.0, 11.5.0
>            Reporter: Stephane Chazelas
>         Attachments: tcptls_poll.diff, tlsBigSDPdebug.patch, tlsBigSDP.patch, tls_read_fix_try1_1.8.11.1.diff, tls_read_fix_try2_1.8.11.1.diff, tls_read_fix_try3_1.8.11.1.diff, tls_read.patch
>
>
> When using jitsi (http://jitsi.org) (debian amd64 one) as sip-tls extension, one can see the SSL connection to asterisk being dropped (abnormally, but that seems due to ASTERISK-18342) during the registration and placing calls don't work.
> I first thought it was a SSL method issue as jitsi doesn't seem to support SSLv3 or TLSv1 and I was able to make it work by using a MitM that proxied the connection through socat: jitsi was able to talk to socat OK and socat to asterisk OK.
> But it looks more like a timing/undeterministic issue. I then had a look at the code, added a little logging and found out that the connection was closed because of fgets() returning NULL in _sip_tcp_helper_thread().
> I then added logging to ssl_read() to see if SSL_read() ever failed, but it doesn't so I don't understand how that fgets could return eof/error. In that case. Then, I had a hard time understanding that business of need_poll/after_poll.
> If I understand correctly, tcptls_session->fd is the network socket that carries the encrypted data and other ssl out-of-band stuff and has been made non-blocking, and tcptls_session->f which is a funopen(tcptls_session->ssl, ssl_read, ssl_write, NULL, ssl_close) (or fopencookie Linux equivalent). polls are made on the fd before doing fgets that eventually call SSL_read. That sounds to me like a recipe for catastrophy, deadlocks and the like but I have to admit I have not understood/seen the design fully.
> I still don't get how fgets() can return NULL here but I tried to bring the need_poll/after_poll trick further by doing:
> {code}
> @@ -2659,7 +2637,7 @@ static void *_sip_tcp_helper_thread(stru
>                                  * TLS layer */
>                                 if (!tcptls_session->ssl || need_poll) {
>                                         need_poll = 0;
> -                                       after_poll = 1;
> +                                       after_poll++;
>                                         res = ast_wait_for_input(tcptls_session->fd, timeout);
>                                         if (res < 0) {
>                                                 ast_debug(2, "SIP TCP server :: ast_wait_for_input returned %d\n", res);
> @@ -2674,7 +2654,7 @@ static void *_sip_tcp_helper_thread(stru
>                                 ast_mutex_lock(&tcptls_session->lock);
>                                 if (!fgets(buf, sizeof(buf), tcptls_session->f)) {
>                                         ast_mutex_unlock(&tcptls_session->lock);
> -                                       if (after_poll) {
> +                                       if (after_poll > 1) {
>                                                 goto cleanup;
>                                         } else {
>                                                 need_poll = 1;
> {code}
> and it fixed the issue.
> So, there's something definitely wrong though I couldn't tell exactly what.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list