[asterisk-bugs] [JIRA] (ASTERISK-24832) DTLS-crashes within openssl
Stefan Engström (JIRA)
noreply at issues.asterisk.org
Mon Mar 2 03:09:34 CST 2015
[ https://issues.asterisk.org/jira/browse/ASTERISK-24832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=225218#comment-225218 ]
Stefan Engström edited comment on ASTERISK-24832 at 3/2/15 3:08 AM:
--------------------------------------------------------------------
Not any crashes for a while now but we get DTLS-Failures, see The DTLSFailure6...-log (with comments in it). At first I thought this might suggest that we also need to do some more concurrency handling between calls to dtls_srtp_check_pending from __rtp_rcvfrom and from the dtls_srtp_handle_timeout like the russian dude said, but I'm not sure. Increasing the timeout for dtls_srtp_check_pending from (old value(=999 ms)) to 3*(old value) seemed to decrease failures...
Adding mutexes around all calls to dtls_srtp_check_pending seemed to lead to another issue, possibly a deadlock...
was (Author: stefaneng86):
The DTLSFailure6...-log (with comments in it) seems to show that we also need to do some more concurrency handling between calls to dtls_srtp_check_pending from __rtp_rcvfrom and from the dtls_srtp_handle_timeout, just like the russian dude had predicted . It doesnt produce a crash but i think it lead to an unnecessary DTLS-failure. I'm not sure what to do, because adding mutexes around all calls to dtls_srtp_check_pending seemed to lead to another issue, possibly a deadlock...
> DTLS-crashes within openssl
> ----------------------------
>
> Key: ASTERISK-24832
> URL: https://issues.asterisk.org/jira/browse/ASTERISK-24832
> Project: Asterisk
> Issue Type: Bug
> Security Level: None
> Components: Resources/res_rtp_asterisk
> Affects Versions: 13.1.0
> Environment: Fedora 20 x86_64, openssl-1.0.1e-41.fc20.x86_64, Asterisk 13.1.0, Chrome SIPML5 chan_sip peers with transport WSS
> Reporter: Stefan Engström
> Assignee: Rusty Newton
> Attachments: crash1.txt, crash2.txt, crash3.txt, crash4.txt, crash5.extralog, crash5.txt, CUSTOMERRORDEBUGLOG, DTLSfailure6ErrorlogNocrashUsingNewPatch.txt, DTLSREVIEWME.patch, SIPCONF.txt, TESTDTLS.patch, TESTDTLS.patch.workingcopy
>
>
> I'm using 4 chan sip peers with transport WSS. They all use Chrome SIPml5 webrtc. 2 of them call a queue and the other 2 answer. Every 100-1000 calls or so, asterisk gets a crash due to segmentation fault or abort signal within openssl.
> Since it's load-related it's hard to provide enough information but ill try add more continuously.
> ISSUE-0
> First thing i noticed was that dtls_perform_handshake was called too many times but that was fixed by compensating for ASTERISK-24830
> By code inspection and tracing logs; it looks like the crashes mostly occur for dtls->ssl instances where asterisk has role: server, (SSL_set_accept_state(dtls->ssl) has been called.)
> EDIT -- This JIRA is getting a little bigger. It seems there are many sub-problems which are all related to DTLS though... not all sub-issues below may be real issues, some are just me asking questions about code. I'd be happy if a developer took a look at it and answered questions or discussed some of the issues and possible fixes.
> ISSUE-1 - crash3 seems to prove a concurrency issue:
> thread 5 leaving asterisk code at dtls_perform_handshake is performing ssl3_clear on the same ssl struct as that which is sent to ssl_read from __rtp_recvfrom in thread 1
> ISSUE-2: Im curious about the behavior of ast_rtp_on_ice_complete() {
> ...
> dtls_perform_handshake(instance, &rtp->dtls, 0);
> if (rtp->rtcp) {
> dtls_perform_handshake(instance, &rtp->rtcp->dtls, 1);
> }
> ...
> }
> chan_sip seems to call process_sdp which eventually calls res_asterisk_rtp::dtls_set_setup which ultimately sets SSL_set_connect_state(ssl) OR SSL_set_accept_state(ssl) on both (RTP+RTCP) ssl sessions. But this races with the firing of dtls_perform_handshake(instance, &rtp->dtls, 0); from ast_rtp_on_ice_complete. I'm not sure if this is a problem but in my last crash crash4 the ast_ice_on_ice_complete fired before dtls_set_setup which i have never noticed during non-crash-calls,
> the big question is why is dtls_perform_handshake() called at all if we are passive? After i added a check in ast_rtp_on_ice_complete to not do anything if we are passive, it seems to crash a lot less.
> Possibly related to ASTERISK-24651
> Requires patch from ASTERISK-24711
> Requires patch from ASTERISK-24830 (the obvious fix of replacing USE_PJPROJECT WITH HAVE_PJPROJECT...)
--
This message was sent by Atlassian JIRA
(v6.2#6252)
More information about the asterisk-bugs
mailing list