[asterisk-bugs] [JIRA] (ASTERISK-27001) res_pjsip: TLS connection not stable
Ian Gilmour (JIRA)
noreply at issues.asterisk.org
Wed May 31 05:58:57 CDT 2017
[ https://issues.asterisk.org/jira/browse/ASTERISK-27001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=237218#comment-237218 ]
Ian Gilmour commented on ASTERISK-27001:
----------------------------------------
I did a SIPp test over a long weekend with a slightly modified version of the pjproject-2.6.patch in place (the only change was to modify the patch to count the number of times it ignores the SSL BIO error).
i.e.
{noformat}
/* SSL might just return SSL_ERROR_WANT_READ in
* re-negotiation.
*/
if (err != SSL_ERROR_NONE && err != SSL_ERROR_WANT_READ)
{
#define EXPERIMENTAL 1
#if EXPERIMENTAL
// experimental...
if (err == SSL_ERROR_SYSCALL && size_ == -1
&& ERR_peek_error() == 0 && errno == 0) {
static int count=0;
status = STATUS_FROM_SSL_ERR2("Read", ssock, size_, err, len);
PJ_LOG(2,("SSL", "BIO error: %d", count));
count++;
// ignore these errors
;
} else {
/* Reset SSL socket state, then return PJ_FALSE */
status = STATUS_FROM_SSL_ERR2("Read", ssock, size_, err, len);
reset_ssl_sock_state(ssock);
goto on_error;
}
#else
status = STATUS_FROM_SSL_ERR2("Read", ssock, size_, err, len);
reset_ssl_sock_state(ssock);
goto on_error;
#endif
}
status = do_handshake(ssock);
{noformat}
In my test config. I only have 1 TLS connection to a SIP server so the above suffices.
The test ran for 4 days and completed 100,000+ loopback calls. It closed the existing connection and opened a new TLS connection to the SIP server a total of 5 times over the 4 days. 3 were due to the SIP server being restarted, so not an Asterisk issue, the other 2 reconnections, because of the extra SSL logging in the pjproject-2.6.patch, were reported as being due to:
{noformat}
WARNING: pjproject: SSL SSL_ERROR_SSL (Read): Level: 0 err: <336151548> <SSL routines-SSL3_READ_BYTES-sslv3 alert bad record mac> len: 6000
{noformat}
The SSL BIO error count was 75 by the end of the 4 day test. i.e. without the pjproject-2.6.patch applied Asterisk would have closed and reopened the TLS connection a further 75 times.
> res_pjsip: TLS connection not stable
> ------------------------------------
>
> Key: ASTERISK-27001
> URL: https://issues.asterisk.org/jira/browse/ASTERISK-27001
> Project: Asterisk
> Issue Type: Bug
> Security Level: None
> Components: pjproject/pjsip
> Affects Versions: 13.15.0
> Environment: centos 6.8(64-bit)
> Reporter: Ian Gilmour
> Assignee: Unassigned
> Attachments: output.tgz, pjproject-2.6.patch
>
>
> Hi,
> I have a development Asterisk 13.15.0 test setup (uses the bundled pjsip-2.6).
> On startup Asterisk registers 1 Asterisk users with a remote OpenSIPS server, over TLS, using the PJSIP stack. As part of the test this Asterisk PJSIP user is reregistered with OpenSIPS Server every couple of mins.
> All outgoing/incoming pjsip call media is encrypted using SRTP and via an external RTPPROXY running alongside the external OpenSIPS Server.
> Asterisk is additionally configured to use PJSIP on 127.0.0.1:5060 to allow calls from a locally run SIPp process. All SIPp calls are TCP+RTP.
> I use SIPp to run multiple concurrent loopback calls (calls vary in duration) through Asterisk to the OpenSIPS server and back to an echo() service running on the same Asterisk).
> i.e.
> {noformat}
> SIPp <-TCP/RTP-> Asterisk <-TLS/SRTP-> OpenSIPS server (+ rtpproxy) <-TLS/SRTP-> Asterisk (echo service).
> {noformat}
> With no calls running the PJSIP TLS connection stays up and I see it reregistering the user every ~2mins.
> When I start to run the SIPp test I start seeing the PJSIP stack having TLS issues - closing the current port as a result, in this state outgoing SIPp calls obviously start failing. A few seconds later Asterisk (PJSIP) opens a new port, reregistering with the OpenSIPS server, and the calls continue. With SIPp running the connection is being reestablished every ~10-20 minutes due to TLS issues.
> If I switch Asterisk to use the chan_sip stack rather than the PJSIP stack for the TLS connection to the OpenSIPS server the connection stays up with no call failures.
> I patched a couple of PJSIP files to help me see what's going on and I have played with the PJSIP TLS code. I can improve the reliability of the connection by ignoring a specific OpenSSL error condition (see the code within #if EXPERIMENTAL...#endif in the attached patch). In the original code this error causes of >90% of the connection failures I see. With this mod in place the TLS connection stays up for hours rather than minutes at a time, on the same outgoing port, and calls work fine. I doubt this mod is the proper fix though.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
More information about the asterisk-bugs
mailing list