[asterisk-bugs] [JIRA] (ASTERISK-24832) DTLS-crashes within openssl

Stefan Engström (JIRA) noreply at issues.asterisk.org
Thu Feb 26 14:23:35 CST 2015


    [ https://issues.asterisk.org/jira/browse/ASTERISK-24832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=225179#comment-225179 ] 

Stefan Engström edited comment on ASTERISK-24832 at 2/26/15 2:22 PM:
---------------------------------------------------------------------

 I will recompile with MALLOC_DEBUG , DONT_OPTIMIZE and BETTER_BACKTRACES but there is no telling when the next crash occurs.

I attach the sip.conf for the relevant peers -- I'm unable to provide the exact steps for how to produce sipml5 chrome webrtc calling robots calling in the same pattern as mine, because they run in an environment thats not mine (or public). Ill see if im allowed to at least acquire pcap files. Maybe I can find a way to reproduce the crash using only the sipml5 demo page and manual calling but basically it should be 2 webrtc callers always calling the same queue and hanging up after a random time less than 3 minutes whereas the 2 agents in the queue attempt to answer calls instantly and then hang up after random time less than 3 minutes


I realize at the moment it's hard for anyone to reproduce my environment -- I'm sort of hoping to find to root of the issue myself 


was (Author: stefaneng86):
 I will recompile with MALLOC_DEBUG , DONT_OPTIMIZE and BETTER_BACKTRACES but there is no telling when the next crash occurs.

I attach the sip.conf for the relevant peers -- I'm unable to provide the exact steps for how to produce sipml5 chrome webrtc calling robots calling in the same pattern as mine, because they run in an environment thats not mine (or public). Ill see if im allowed to at least acquire pcap files. 

I realize at the moment it's hard for anyone to reproduce my environment -- I'm sort of hoping to find to root of the issue myself 

> DTLS-crashes within openssl 
> ----------------------------
>
>                 Key: ASTERISK-24832
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-24832
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Resources/res_rtp_asterisk
>    Affects Versions: 13.1.0
>         Environment: Fedora 20 x86_64, openssl-1.0.1e-41.fc20.x86_64, Asterisk 13.1.0, Chrome SIPML5 chan_sip peers with transport WSS
>            Reporter: Stefan Engström
>            Assignee: Stefan Engström
>         Attachments: crash1.txt, crash2.txt, crash3.txt, CUSTOMERRORDEBUGLOG, SIPCONF.txt, TESTDTLS.patch.workingcopy
>
>
> I'm using 4 chan sip peers with transport WSS. They all use Chrome SIPml5 webrtc. 2 of them call a queue and the other 2 answer. Every 100-1000 calls or so, asterisk gets a crash due to segmentation fault or abort signal within openssl.
> Since it's load-related it's hard to provide enough information but ill try add more continuously.
> First thing i noticed was  that dtls_perform_handshake was called too many times but that was fixed with https://issues.asterisk.org/jira/browse/ASTERISK-24830 
> I have no prior experience of using openssl and little experience of asterisk and C, so debugging is challenging.
> By code inspection and tracing logs; it looks like the crashes only occur for dtls->ssl instances where asterisk has role: server, (SSL_set_accept_state(dtls->ssl) has been called.) 
> I'm not sure how to debug further other than trying to somehow log all calls to libssl and see if any calls are out of order just before crash?
> EDIT - the last coredump crash3 seems to prove a concurrency issue:
> thread 5 leaving asterisk code at dtls_perform_handshake is performing ssl3_clear on the same ssl struct as that which is sent to ssl_read from __rtp_recvfrom in thread 1
> Possibly related to 



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list