[asterisk-bugs] [JIRA] (ASTERISK-24832) DTLS-crashes within openssl
Stefan Engström (JIRA)
noreply at issues.asterisk.org
Thu Feb 26 13:27:34 CST 2015
[ https://issues.asterisk.org/jira/browse/ASTERISK-24832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stefan Engström updated ASTERISK-24832:
---------------------------------------
Description:
I'm using 4 chan sip peers with transport WSS. They all use Chrome SIPml5 webrtc. 2 of them call a queue and the other 2 answer. Every 100-1000 calls or so, asterisk gets a crash due to segmentation fault or abort signal within openssl.
Since it's load-related it's hard to provide enough information but ill try add more continuously.
First thing i noticed was that dtls_perform_handshake was called too many times but that was fixed with https://issues.asterisk.org/jira/browse/ASTERISK-24830
I have no prior experience of using openssl and little experience of asterisk and C, so debugging is challenging.
By code inspection and tracing logs; it looks like the crashes only occur for dtls->ssl instances where asterisk has role: server, (SSL_set_accept_state(dtls->ssl) has been called.)
I'm not sure how to debug further other than trying to somehow log all calls to libssl and see if any calls are out of order just before crash?
EDIT - the last coredump crash3 seems to prove a concurrency issue:
thread 5 leaving asterisk code at dtls_perform_handshake is performing ssl3_clear on the same ssl struct as that which is sent to ssl_read from __rtp_recvfrom in thread 1
Possibly related to ASTERISK-24651 . Definitely related to ASTERISK-24711 since that patch is required for my openssl version.
was:
I'm using 4 chan sip peers with transport WSS. They all use Chrome SIPml5 webrtc. 2 of them call a queue and the other 2 answer. Every 100-1000 calls or so, asterisk gets a crash due to segmentation fault or abort signal within openssl.
Since it's load-related it's hard to provide enough information but ill try add more continuously.
First thing i noticed was that dtls_perform_handshake was called too many times but that was fixed with https://issues.asterisk.org/jira/browse/ASTERISK-24830
I have no prior experience of using openssl and little experience of asterisk and C, so debugging is challenging.
By code inspection and tracing logs; it looks like the crashes only occur for dtls->ssl instances where asterisk has role: server, (SSL_set_accept_state(dtls->ssl) has been called.)
I'm not sure how to debug further other than trying to somehow log all calls to libssl and see if any calls are out of order just before crash?
EDIT - the last coredump crash3 seems to prove a concurrency issue:
thread 5 leaving asterisk code at dtls_perform_handshake is performing ssl3_clear on the same ssl struct as that which is sent to ssl_read from __rtp_recvfrom in thread 1
> DTLS-crashes within openssl
> ----------------------------
>
> Key: ASTERISK-24832
> URL: https://issues.asterisk.org/jira/browse/ASTERISK-24832
> Project: Asterisk
> Issue Type: Bug
> Security Level: None
> Components: Resources/res_rtp_asterisk
> Affects Versions: 13.1.0
> Environment: Fedora 20 x86_64, openssl-1.0.1e-41.fc20.x86_64, Asterisk 13.1.0, Chrome SIPML5 chan_sip peers with transport WSS
> Reporter: Stefan Engström
> Assignee: Stefan Engström
> Attachments: crash1.txt, crash2.txt, crash3.txt, CUSTOMERRORDEBUGLOG, SIPCONF.txt, TESTDTLS.patch.workingcopy
>
>
> I'm using 4 chan sip peers with transport WSS. They all use Chrome SIPml5 webrtc. 2 of them call a queue and the other 2 answer. Every 100-1000 calls or so, asterisk gets a crash due to segmentation fault or abort signal within openssl.
> Since it's load-related it's hard to provide enough information but ill try add more continuously.
> First thing i noticed was that dtls_perform_handshake was called too many times but that was fixed with https://issues.asterisk.org/jira/browse/ASTERISK-24830
> I have no prior experience of using openssl and little experience of asterisk and C, so debugging is challenging.
> By code inspection and tracing logs; it looks like the crashes only occur for dtls->ssl instances where asterisk has role: server, (SSL_set_accept_state(dtls->ssl) has been called.)
> I'm not sure how to debug further other than trying to somehow log all calls to libssl and see if any calls are out of order just before crash?
> EDIT - the last coredump crash3 seems to prove a concurrency issue:
> thread 5 leaving asterisk code at dtls_perform_handshake is performing ssl3_clear on the same ssl struct as that which is sent to ssl_read from __rtp_recvfrom in thread 1
> Possibly related to ASTERISK-24651 . Definitely related to ASTERISK-24711 since that patch is required for my openssl version.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
More information about the asterisk-bugs
mailing list