[asterisk-bugs] [JIRA] (ASTERISK-25103) Roundup - investigate Asterisk DTLS crashes

Tue Jul 7 11:44:33 CDT 2015

    [ https://issues.asterisk.org/jira/browse/ASTERISK-25103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=226815#comment-226815 ] 

Dade Brandon commented on ASTERISK-25103:
-----------------------------------------

We know it isn't ICE, it was a lot of work to get to that point.  Chrome debugging tools shows that ice is fully connected during the delay periods, and also our webrtc phone was set up to show a 'purple line indicator' during ice connectivity establishment, which is especially useful for troubleshooting this issue as we and our customers can see ice finish completing while the call is still ringing, when using early media.

Since we know all of our servers are publicly accessible, we've implemented a substantial amount of obviously not recommended patches to speed up the ICE process:
 - Adjusted pjproject to always send a max value ice breaker
 - Adjusted Asterisk to always be ICE_CONTROLLING (confirmed via pcaps, and we confirmed that Asterisk is running triggered checks as defined in rfc 5245 @2.3)
 - Implemented a 'bool should_skip_candidiate_address()' that returns true for IPV6, and IPV4 10.x.x.x + 192.168.x.x addresses,
    -  placed this in to process_sdp_a_ice to skip these candidates and prevent triggered checks that were happening in our datacenters
    -  placed this in add_ice_to_sdp to prevent internal interfaces from being sent as candidates
          - additionally in add_ice_to_sdp, we prevent sending more than the first otherwise valid candidate per candidate ID, which results in only the first public WAN interface being advertised - this is the one that media will flow over on servers with multiple IPs, since we aren't using media_address in sip.conf

So yeah, it may be DTLS.  I'm unsure if its that, or if Chrome may be using some sort of rtp probation, because if we use early media and generate ringing, we can hear the asterisk-generated ringing, but then at the start of an external call being bridged (with asterisk being in the media path) the delay occurs.  That suggests to me that either DTLS is triggering a renegotiation of sorts on the source update, even though the media path hasn't changed, or that the source update flag in the RTP being sent to the Chrome-WebRTC peer is causing Chrome to initiate probation checks.  Nothing odd appears in the debug logs in asterisk as the bridge between the WebRTC and remote peer occurs.  We've tested calls to internal extensions set up to mimic similar conditions by ring via early media, then bridge after set or random delays and playback() media, and these test calls to not have the delay, suggesting to me that it's only an issue when there is a source update.  If it is DTLS, is there any area I can look in to to make it more aggressive in its negotiation on the asterisk side?
(Restricted to JIRA Users group)
> Roundup - investigate Asterisk DTLS crashes
> -------------------------------------------
>
>                 Key: ASTERISK-25103
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-25103
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Resources/res_rtp_asterisk
>         Environment: Asterisk 11, 13, Master
>            Reporter: Rusty Newton
>            Assignee: Joshua Colp
>
> A issue for an investigation into the various DTLS crashes currently hanging about.
> I'll link the issues currently on the tracker to this issue rather than linking them all to each other.

--
This message was sent by Atlassian JIRA
(v6.2#6252)