[asterisk-bugs] [JIRA] (ASTERISK-24832) DTLS-crashes within openssl

Stefan Engström (JIRA) noreply at issues.asterisk.org
Fri Feb 27 08:43:34 CST 2015


    [ https://issues.asterisk.org/jira/browse/ASTERISK-24832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=225194#comment-225194 ] 

Stefan Engström commented on ASTERISK-24832:
--------------------------------------------

I followed instructions on https://wiki.asterisk.org/wiki/display/AST/MALLOC_DEBUG+Compiler+Flag

menuselect/menuselect --disable-category MENUSELECT_CORE_SOUNDS --disable-category MENUSELECT_EXTRA_SOUNDS --disable-category MENUSELECT_MOH --enable-category MENUSELECT_ADDONS --enable app_meetme --enable app_page --disable chan_mgcp --disable chan_mobile --enable OPTIONAL_API --enable MALLOC_DEBUG --enable DONT_OPTIMIZE

I even ran the command asterisk -rx "memory atexit list on"

But after core is dumped mmlog.txt only contains 3 lines without information 1421693682 - New session. I suspect the reason was mmlog being owned by root so i did chown asterisk:asterisk /var/log/asterisk/mmlog

What "memory atexit"-commands are required?




> DTLS-crashes within openssl 
> ----------------------------
>
>                 Key: ASTERISK-24832
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-24832
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Resources/res_rtp_asterisk
>    Affects Versions: 13.1.0
>         Environment: Fedora 20 x86_64, openssl-1.0.1e-41.fc20.x86_64, Asterisk 13.1.0, Chrome SIPML5 chan_sip peers with transport WSS
>            Reporter: Stefan Engström
>            Assignee: Stefan Engström
>         Attachments: crash1.txt, crash2.txt, crash3.txt, CUSTOMERRORDEBUGLOG, SIPCONF.txt, TESTDTLS.patch.workingcopy
>
>
> I'm using 4 chan sip peers with transport WSS. They all use Chrome SIPml5 webrtc. 2 of them call a queue and the other 2 answer. Every 100-1000 calls or so, asterisk gets a crash due to segmentation fault or abort signal within openssl.
> Since it's load-related it's hard to provide enough information but ill try add more continuously.
> First thing i noticed was  that dtls_perform_handshake was called too many times but that was fixed with https://issues.asterisk.org/jira/browse/ASTERISK-24830 
> I have no prior experience of using openssl and little experience of asterisk and C, so debugging is challenging.
> By code inspection and tracing logs; it looks like the crashes only occur for dtls->ssl instances where asterisk has role: server, (SSL_set_accept_state(dtls->ssl) has been called.) 
> I'm not sure how to debug further other than trying to somehow log all calls to libssl and see if any calls are out of order just before crash?
> EDIT - the last coredump crash3 seems to prove a concurrency issue:
> thread 5 leaving asterisk code at dtls_perform_handshake is performing ssl3_clear on the same ssl struct as that which is sent to ssl_read from __rtp_recvfrom in thread 1
> EDIT2 - I have about 20 other old core dumps from before i compiled with malloc_debug - each is slightly different, so possibly there are many independent issues resulting in crashes,  but one common failure is the assertion:
> OpenSSLDie (file=file at entry=0x7f19a8915db8 "d1_both.c", line=line at entry=1210, assertion=assertion at entry=0x7f19a8915ee0 "s->d1->w_msg_hdr.msg_len + DTLS1_HM_HEADER_LENGTH == (unsigned int)s->init_num") at cryptlib.c:919, this looks to me like a timing thing, i.e. that some other thread has written to s->d1 or s->init_num for reasons unknown... 
> I'm curious about 
> Possibly related to ASTERISK-24651
> Requires patch from ASTERISK-24711



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list