[asterisk-bugs] [JIRA] (ASTERISK-27001) res_pjsip: TLS connection not stable

Ian Gilmour (JIRA) noreply at issues.asterisk.org
Wed May 31 05:58:57 CDT 2017


    [ https://issues.asterisk.org/jira/browse/ASTERISK-27001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=237218#comment-237218 ] 

Ian Gilmour commented on ASTERISK-27001:
----------------------------------------

I did a SIPp test over a long weekend with a slightly modified version of the pjproject-2.6.patch in place (the only change was to modify the patch to count the number of times it ignores the SSL BIO error).

i.e.
{noformat}
		/* SSL might just return SSL_ERROR_WANT_READ in 
		 * re-negotiation.
		 */
		if (err != SSL_ERROR_NONE && err != SSL_ERROR_WANT_READ)
		{
#define EXPERIMENTAL 1
#if EXPERIMENTAL
		    // experimental...
                    if (err == SSL_ERROR_SYSCALL && size_ == -1
			&& ERR_peek_error() == 0 && errno == 0) {
			static int count=0;
			status = STATUS_FROM_SSL_ERR2("Read", ssock, size_, err, len);
			PJ_LOG(2,("SSL", "BIO error: %d", count));
			count++;
		        // ignore these errors
		        ;
		    } else {
		        /* Reset SSL socket state, then return PJ_FALSE */
		        status = STATUS_FROM_SSL_ERR2("Read", ssock, size_, err, len);
		        reset_ssl_sock_state(ssock);
		        goto on_error;
		    }
#else
		    status = STATUS_FROM_SSL_ERR2("Read", ssock, size_, err, len);
		    reset_ssl_sock_state(ssock);
		    goto on_error;
#endif
		}

		status = do_handshake(ssock);
{noformat}

In my test config. I only have 1 TLS connection to a SIP server so the above suffices.

The test ran for 4 days and completed 100,000+ loopback calls. It closed the existing connection and opened a new TLS connection to the SIP server a total of 5 times over the 4 days. 3 were due to the SIP server being restarted, so not an Asterisk issue, the other 2 reconnections, because of the extra SSL logging in the pjproject-2.6.patch, were reported as being due to:

{noformat}
WARNING: pjproject: SSL SSL_ERROR_SSL (Read): Level: 0 err: <336151548> <SSL routines-SSL3_READ_BYTES-sslv3 alert bad record mac> len: 6000
{noformat}

The SSL BIO error count was 75 by the end of the 4 day test. i.e. without the pjproject-2.6.patch applied Asterisk would have closed and reopened the TLS connection a further 75 times.


> res_pjsip: TLS connection not stable
> ------------------------------------
>
>                 Key: ASTERISK-27001
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-27001
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: pjproject/pjsip
>    Affects Versions: 13.15.0
>         Environment: centos 6.8(64-bit)
>            Reporter: Ian Gilmour
>            Assignee: Unassigned
>         Attachments: output.tgz, pjproject-2.6.patch
>
>
> Hi,
> I have a development Asterisk 13.15.0 test setup (uses the bundled pjsip-2.6).
> On startup Asterisk registers 1 Asterisk users with a remote OpenSIPS server, over TLS, using the PJSIP stack. As part of the test this Asterisk PJSIP user is reregistered with OpenSIPS Server every couple of mins.
> All outgoing/incoming pjsip call media is encrypted using SRTP and via an external RTPPROXY running alongside the external OpenSIPS Server.
> Asterisk is additionally configured to use PJSIP on 127.0.0.1:5060 to allow calls from a locally run SIPp process. All SIPp calls are TCP+RTP.
> I use SIPp to run multiple concurrent loopback calls (calls vary in duration) through Asterisk to the OpenSIPS server and back to an echo() service running on the same Asterisk).
> i.e.
> {noformat}
>   SIPp <-TCP/RTP-> Asterisk <-TLS/SRTP-> OpenSIPS server (+ rtpproxy) <-TLS/SRTP-> Asterisk (echo service).
> {noformat}
> With no calls running the PJSIP TLS connection stays up and I see it reregistering the user every ~2mins.
> When I start to run the SIPp test I start seeing the PJSIP stack having TLS issues - closing the current port as a result, in this state outgoing SIPp calls obviously start failing.  A few seconds later Asterisk (PJSIP) opens a new port, reregistering with the OpenSIPS server, and the calls continue. With SIPp running the connection is being reestablished every ~10-20 minutes due to TLS issues.
> If I switch Asterisk to use the chan_sip stack rather than the PJSIP stack for the TLS connection to the OpenSIPS server the connection stays up with no call failures.
> I patched a couple of PJSIP files to help me see what's going on and I have played with the PJSIP TLS code. I can improve the reliability of the connection by ignoring a specific OpenSSL error condition (see the code within #if EXPERIMENTAL...#endif in the attached patch). In the original code this error causes of >90% of the connection failures I see. With this mod in place the TLS connection stays up for hours rather than minutes at a time, on the same outgoing port, and calls work fine. I doubt this mod is the proper fix though.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list