[asterisk-bugs] [JIRA] (ASTERISK-24832) DTLS-crashes within openssl

Stefan Engström (JIRA) noreply at issues.asterisk.org
Sun Mar 1 08:07:34 CST 2015


     [ https://issues.asterisk.org/jira/browse/ASTERISK-24832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Engström updated ASTERISK-24832:
---------------------------------------

    Attachment: DTLSREVIEWME.patch

uploading a new draft for patch. It's been built to compensate for issues found in crashes. Parts of the patch from 24651 was used but not all. It has unresolved TODOs and it has the fix for ASTERISK-24830 included

> DTLS-crashes within openssl 
> ----------------------------
>
>                 Key: ASTERISK-24832
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-24832
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Resources/res_rtp_asterisk
>    Affects Versions: 13.1.0
>         Environment: Fedora 20 x86_64, openssl-1.0.1e-41.fc20.x86_64, Asterisk 13.1.0, Chrome SIPML5 chan_sip peers with transport WSS
>            Reporter: Stefan Engström
>            Assignee: Rusty Newton
>         Attachments: crash1.txt, crash2.txt, crash3.txt, crash4.txt, crash5.extralog, crash5.txt, CUSTOMERRORDEBUGLOG, DTLSREVIEWME.patch, SIPCONF.txt, TESTDTLS.patch, TESTDTLS.patch.workingcopy
>
>
> I'm using 4 chan sip peers with transport WSS. They all use Chrome SIPml5 webrtc. 2 of them call a queue and the other 2 answer. Every 100-1000 calls or so, asterisk gets a crash due to segmentation fault or abort signal within openssl.
> Since it's load-related it's hard to provide enough information but ill try add more continuously.
> ISSUE-0
> First thing i noticed was  that dtls_perform_handshake was called too many times but that was fixed by compensating for ASTERISK-24830 
> By code inspection and tracing logs; it looks like the crashes mostly occur for dtls->ssl instances where asterisk has role: server, (SSL_set_accept_state(dtls->ssl) has been called.) 
> EDIT -- This JIRA is getting a little bigger. It seems there are many sub-problems which are all related to DTLS though... not all sub-issues below may be real issues, some are just me asking questions about code. I'd be happy if a developer took a look at it and answered questions or discussed some of the issues and possible fixes.
> ISSUE-1 - crash3 seems to prove a concurrency issue:
> thread 5 leaving asterisk code at dtls_perform_handshake is performing ssl3_clear on the same ssl struct as that which is sent to ssl_read from __rtp_recvfrom in thread 1
> ISSUE-2: Im curious about the behavior of ast_rtp_on_ice_complete() {
> ...
>         dtls_perform_handshake(instance, &rtp->dtls, 0);
>         if (rtp->rtcp) {
>                 dtls_perform_handshake(instance, &rtp->rtcp->dtls, 1);
>         }
> ...
> }
> chan_sip seems to call process_sdp -> process_sdp_a_dtls -> res_asterisk_rtp::dtls_set_setup which ulimately sets SSL_set_connect_state(ssl) OR SSL_set_accept_state(ssl) on both (RTP+RTCP) ssl sessions. But this races with the firing of  dtls_perform_handshake(instance, &rtp->dtls, 0); from ast_rtp_on_ice_complete. I'm not sure if this is a problem but in my last crash crash4 the ast_ice_on_ice_complete fired before dtls_set_setup which i have never noticed during non-crash-calls,
> ISSUE-3 
> why is SSL_do_handshake(dtls->ssl) called at all if we are passive? 
> i added a debug-check dtls_perform_handshake() {...} to only SSL_do_handshake if we are not passive, and it seems to do no harm. (dtls->dtls_setup != AST_RTP_DTLS_SETUP_PASSIVE)
> ISSUE-4
> continuing from issue-2; It's possible for SSL_is_init_finished(&rtp->dtls) to be false when calling dtls_perform_handshake(instance, &rtp->dtls, 0) and the next instant when dtls_perform_handshake(instance, &rtp->rtcp->dtls, 1) is called SSL_is_init_finished(&rtp->rtcp->dtls) is true, which causes a clear on rtp->rtcp->dtls->ssl but not for rtp->dtls->ssl, this leads to potential problems?
> ISSUE-5
> See crash5.txt . in __rtp_rcvfrom (thread 4) we call dtls_srtp_setup if SSL_is_init_finished(dtls->ssl) is true for the rtp->dtls - we don't seem to care if SSL_is_init_finished(rtp->rtcp->dtls->ssl) is true too, anyways this results in a call to ast_srtp_create, causing thread 1 to execute ast_srtp_protect --which it fails for some reason....- Q: Why is this? I attached a special crash5.log showing some function calls up to 30 seconds before the crash, which was generated by debugpatch: TESTDTLS.patch . By log it seems like the call to dtls_srtp_setup was started at 01:10:09 and had not returned ~25 seconds later by the time of the crash,. (relevant instance was 0x7fdc00dc6958)
> Possibly related to ASTERISK-24651
> Requires patch from ASTERISK-24711
> Requires patch from ASTERISK-24830  (the obvious fix of replacing USE_PJPROJECT WITH HAVE_PJPROJECT...)
> EDIT- THE TESTDTLS.patch contains some experimental code as well as spams of log_errors, with it, it seems to crash less, (only a different kind of crash, (crash5), so far) but patch is far from a fix and contains millions of error_log spams. I tried merging the ASTERISK-24651-patch to asterisk 13 at first but either I made some mistake or it had performance penalties cause i couldn't call at all with it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list