[asterisk-bugs] [JIRA] (ASTERISK-25127) DTLS crashes following "Unable to cancel schedule ID" in dtls_srtp_check_pending

Dade Brandon (JIRA) noreply at issues.asterisk.org
Mon May 25 20:06:32 CDT 2015


    [ https://issues.asterisk.org/jira/browse/ASTERISK-25127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=226345#comment-226345 ] 

Dade Brandon commented on ASTERISK-25127:
-----------------------------------------

Re core #1 - I inspected the packet capture, nothing looks overly abnormal about it, but I was able to confirm that the channel/leg that was undergoing sip_hangup within the pbx thread at the time of the segfault, which had !instance->engine, was not a DTLS channel, it was the ITSP leg.   No signalling to end was received from that leg.  The other leg, which was using DTLS, was the one with the "Unable to cancel schedule ID" set.

I can't see anywhere that would set instance->engine to NULL, or any reason why the dialog would have its ->rtp set to non-NULL if the engine didn't initialize, so I suspect some sort of memory corruption or double free happening in the DTLS code, in a way that affects the bridged channel.

Hopefully this helps in some way.

> DTLS crashes following "Unable to cancel schedule ID" in dtls_srtp_check_pending
> --------------------------------------------------------------------------------
>
>                 Key: ASTERISK-25127
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-25127
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Resources/res_rtp_asterisk
>    Affects Versions: 11.18.0
>         Environment: Linux kernel "3.13.0-24-generic"
> Ubuntu 14.04,
> Asterisk 11.18.0-rc1, 
> Compiler flags: DONT_OPTIMIZE, LOADABLE_MODULES, BETTER_BACKTRACES, BUILD_NATIVE, G711_NEW_ALGORITHM
> Openssl: 1.0.1f-1ubuntu2.11
> libuuid1: 2.20.1-5.1ubuntu20.4
> SIP Realtime: Module loaded & not in use
> Timer: res_timing_timerfd    (res_timing_pthread also loaded)
> See attached 'environment.txt' for output of 'core show settings' and 'module show'
> All calls are:
>    [Peer] <->ch1<-> Asterisk <->ch2<-> [Misc ITSPs]
> ch1 transport is always SIP over WSS using sipjs on Chrome (stable/M42/M43 and/or Canary M45) with ulaw codec.  Peer is almost always NATd, Asterisk is never NATd,
> ch2 transport is always Plain Old SIP (5060, no TLS) with RTP (no [d]TLS)
>            Reporter: Dade Brandon
>            Severity: Critical
>         Attachments: core-1.txt, core-2.txt, environment.txt
>
>
> h2. Preface
> First I just want to say, I am very familiar with the other DTLS crash issues on Jira, I believe that if this is related, it's probably a precursor to the crashes that create the later segfaults, because since upgrading to trunk I haven't had a core dump, but have continued to experience crashes (asterisk restarting via safe_asterisk-- unknown signal since there's no core).  This is likely more suited as a parent issue to some of the other DTLS crash issues.
> We get about 5-10 crashes per production day (across 35 servers) and I only did the latest update Friday evening, so I will probably know if there's useful core dumps by the end of Tuesday, due to US holidays on Monday.  We service businesses, so the volume is extremely low right now due to the long weekend in the US market.
> Also, we notice that the crashes seem to target certain servers, and that there appears to be a correlation between the affected servers, and the latency of the peers connected to that server.  
> h2. Details:
> Preceding an asterisk crash, we receive "Unable to cancel schedule ID nnnnnn.   This is probably a bug (res_rtp_asterisk.c: dtls_srtp_check_pending, line NNNN)"
> (The line number is 1811 on trunk, we have other patches applied above which are mostly logging related, causing our line number to be less relevant.  The line is "AST_SCHED_DEL_UNREF(rtp->sched, rtp->dtlstimerid, ao2_ref(instance, -1));"
> Asterisk does not die immediately after.  In messages, there is anywhere from 2 seconds to a full minute remaining before each crash.  Note the timing in the example debug logs below.
> On servers that we've added extra logging to, we find logs reporting that for each of these issues, an ast_debug call we inserted from main/rtp_engine.c, for the same thread and call, indicating that the rtp instance->engine is NULL:
> h3. Example 1
> {noformat}
> [14:55:16] WARNING[11973][C-0000e496] res_rtp_asterisk.c: Unable to cancel schedule ID 539829.  This is probably a bug (res_rtp_asterisk.c: dtls_srtp_check_pending, line 1834).
> [14:55:55] DEBUG[11973][C-0000e496] rtp_engine.c: XWSDEBUG4.2 ast_rtp_instance_set_remote_address-- NULL INSTANCE ENGINE for RTP instance '0x7f7c50076da8'
>     - this debug message is placed in ast_rtp_instance_set_remote_address, after ast_sockaddr_copy, and is called if (!instance->engine).
> {noformat}
> h3. Example 2
> {noformat}
> [12:18:54] WARNING[10203][C-0000a76c] res_rtp_asterisk.c: Unable to cancel schedule ID 436177.  This is probably a bug (res_rtp_asterisk.c: dtls_srtp_check_pending, line 1834).
> [12:18:59] DEBUG[23236][C-0000a76c] rtp_engine.c: XWSDEBUG4.2 ast_rtp_instance_set_remote_address-- NULL INSTANCE ENGINE for RTP instance '0x7f6492f03a98'
>     - this debug message is placed in ast_rtp_instance_set_remote_address, after ast_sockaddr_copy, and is called if (!instance->engine).
> [12:18:59] DEBUG[10203][C-0000a76c] rtp_engine.c: XWSDEBUG1.2 ast_rtp_instance_write-- NULL INSTANCE ENGINE for RTP instance '0x7f6492f03a98'
>    - this debug line asserts (instance && !instance->engine) before instance->engine->write(instance, frame) 
> [12:18:59] VERBOSE[11061][C-0000a76c] app_verbose.c: Caller party hung up MinDur running on SIP/Pir1-0000fadf -- Answered time :atime 1431112732:talk 7 -- Route 7503 -- MinDur request 0
> [12:18:59] DEBUG[10203][C-0000a76c] rtp_engine.c: XWSDEBUG29.2 -- NULL INSTANCE ENGINE for Instance '0x7f6492f03a98'
>   - this asserts (instance && !instance->engine) before instance->engine->stop(instance)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list