[asterisk-dev] [Code Review] 3256: res_rtp_asterisk: Undo regression from ASTERISK-23213 while working around PJNATH assertion abort from ASTERISK-22911

Jonathan Rose reviewboard at asterisk.org
Tue Feb 25 11:41:52 CST 2014



> On Feb. 24, 2014, 4:51 p.m., Joshua Colp wrote:
> > /branches/11/res/res_rtp_asterisk.c, lines 554-560
> > <https://reviewboard.asterisk.org/r/3256/diff/1/?file=54416#file54416line554>
> >
> >     Agreed, and pjnath does not provide a mechanism to do just that without destroying/re-creating as Matt says.
> >     
> >     What would also be useful is to further look at the SDPs involved - are the candidates really changing? Do we really need to restart the ICE negotiation?
> 
> Jonathan Rose wrote:
>     No, the candidates offered aren't changing. I think the reason it tries to add them anyway is because we never reached a point where the ICE session was tracked as having started (because the check list compliation is failing).
>     
>     
>     First Invite (SIPML5 client to desk phone)
>     
>     <--- SIP read from WS:10.24.16.82:60366 --->
>     INVITE sip:1201 at 10.24.18.246 SIP/2.0
>     Via: SIP/2.0/WS df7jal23ls0d.invalid;branch=z9hG4bKMKg6LeUkDugGr9B1CnyHICANfn8JwRVR;rport
>     From: "sipml_bot"<sip:sipml_bot at 10.24.18.246>;tag=W42MEkWLdFTUiIaaw8iy
>     To: <sip:1201 at 10.24.18.246>
>     Contact: "sipml_bot"<sip:sipml_bot at df7jal23ls0d.invalid;rtcweb-breaker=no;click2call=no;transport=ws>;+g.oma.sip-im;+sip.ice;language="en,fr"
>     Call-ID: 97982d1a-15e5-cd55-9122-07b3cc20cb7d
>     CSeq: 20907 INVITE
>     Content-Type: application/sdp
>     Content-Length: 1840
>     Max-Forwards: 70
>     User-Agent: IM-client/OMA1.0 sipML5-v1.2014.01.27
>     Organization: Doubango Telecom
>     
>     v=0
>     o=- 3044420497219628500 2 IN IP4 127.0.0.1
>     s=Doubango Telecom - chrome
>     t=0 0
>     a=group:BUNDLE audio
>     a=msid-semantic: WMS nqy2BUfoo7yn1YeWLNBeED2rqMGSb1y8Ypv3
>     m=audio 56984 RTP/SAVPF 111 103 104 0 8 106 105 13 126
>     c=IN IP4 216.207.245.1
>     a=rtcp:56984 IN IP4 216.207.245.1
>     a=candidate:474352566 1 udp 2113937151 10.24.16.82 56984 typ host generation 0
>     a=candidate:474352566 2 udp 2113937151 10.24.16.82 56984 typ host generation 0
>     a=candidate:3038348387 1 udp 1845501695 216.207.245.1 56984 typ srflx raddr 10.24.16.82 rport 56984 generation 0
>     a=candidate:3038348387 2 udp 1845501695 216.207.245.1 56984 typ srflx raddr 10.24.16.82 rport 56984 generation 0
>     a=candidate:1388705606 1 tcp 1509957375 10.24.16.82 0 typ host generation 0
>     a=candidate:1388705606 2 tcp 1509957375 10.24.16.82 0 typ host generation 0
>     a=ice-ufrag:LeGzZojC7gQNvl7o
>     a=ice-pwd:VfqsY2VVKr3xg/mgvdPtcxhp
>     a=ice-options:google-ice
>     a=fingerprint:sha-256 3A:BA:F5:44:53:CD:99:6C:D1:32:9F:80:53:D4:B5:BA:AE:CF:98:54:71:7F:D6:CB:14:7F:D8:94:30:98:89:62
>     a=setup:actpass
>     a=mid:audio
>     a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
>     a=sendrecv
>     a=rtcp-mux
>     a=crypto:0 AES_CM_128_HMAC_SHA1_32 inline:ZQQ2MjApfbPlEAmWe52FRkbKUVH4rXt5p5QnpObB
>     a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:JljeDW3NiakHNYn+bu+vje3cD77nNPytUW79J2Vz
>     a=rtpmap:111 opus/48000/2
>     a=fmtp:111 minptime=10
>     a=rtpmap:103 ISAC/16000
>     a=rtpmap:104 ISAC/32000
>     a=rtpmap:0 PCMU/8000
>     a=rtpmap:8 PCMA/8000
>     a=rtpmap:106 CN/32000
>     a=rtpmap:105 CN/16000
>     a=rtpmap:13 CN/8000
>     a=rtpmap:126 telephone-event/8000
>     a=maxptime:60
>     a=ssrc:3822531142 cname:YE/Lm1aHvDjHxwH0
>     a=ssrc:3822531142 msid:nqy2BUfoo7yn1YeWLNBeED2rqMGSb1y8Ypv3 b50d1acc-19eb-455c-8a4b-40263a2d1a9b
>     a=ssrc:3822531142 mslabel:nqy2BUfoo7yn1YeWLNBeED2rqMGSb1y8Ypv3
>     a=ssrc:3822531142 label:b50d1acc-19eb-455c-8a4b-40263a2d1a9b
>     
>     
>     
>     
>     second invite (SIPML5 client putting the call on hold)
>     
>     INVITE sip:1201 at 10.24.18.246:5060;transport=WS SIP/2.0
>     Via: SIP/2.0/WS df7jal23ls0d.invalid;branch=z9hG4bK6QhlVcXuVSUNyTewJBtU0KIZzL9mMqCE;rport
>     From: "sipml_bot"<sip:sipml_bot at 10.24.18.246>;tag=W42MEkWLdFTUiIaaw8iy
>     To: <sip:1201 at 10.24.18.246>;tag=as269c229d
>     Contact: "sipml_bot"<sip:sipml_bot at df7jal23ls0d.invalid;rtcweb-breaker=no;click2call=no;transport=ws>;+g.oma.sip-im;+sip.ice;language="en,fr"
>     Call-ID: 97982d1a-15e5-cd55-9122-07b3cc20cb7d
>     CSeq: 20908 INVITE
>     Content-Type: application/sdp
>     Content-Length: 1840
>     Max-Forwards: 70
>     User-Agent: IM-client/OMA1.0 sipML5-v1.2014.01.27
>     Organization: Doubango Telecom
>     
>     v=0
>     o=- 3044420497219628500 3 IN IP4 127.0.0.1
>     s=Doubango Telecom - chrome
>     t=0 0
>     a=group:BUNDLE audio
>     a=msid-semantic: WMS nqy2BUfoo7yn1YeWLNBeED2rqMGSb1y8Ypv3
>     m=audio 56984 RTP/SAVPF 111 103 104 0 8 106 105 13 126
>     c=IN IP4 216.207.245.1
>     a=rtcp:56984 IN IP4 216.207.245.1
>     a=candidate:474352566 1 udp 2113937151 10.24.16.82 56984 typ host generation 0
>     a=candidate:474352566 2 udp 2113937151 10.24.16.82 56984 typ host generation 0
>     a=candidate:3038348387 1 udp 1845501695 216.207.245.1 56984 typ srflx raddr 10.24.16.82 rport 56984 generation 0
>     a=candidate:3038348387 2 udp 1845501695 216.207.245.1 56984 typ srflx raddr 10.24.16.82 rport 56984 generation 0
>     a=candidate:1388705606 1 tcp 1509957375 10.24.16.82 0 typ host generation 0
>     a=candidate:1388705606 2 tcp 1509957375 10.24.16.82 0 typ host generation 0
>     a=ice-ufrag:LeGzZojC7gQNvl7o
>     a=ice-pwd:VfqsY2VVKr3xg/mgvdPtcxhp
>     a=ice-options:google-ice
>     a=fingerprint:sha-256 3A:BA:F5:44:53:CD:99:6C:D1:32:9F:80:53:D4:B5:BA:AE:CF:98:54:71:7F:D6:CB:14:7F:D8:94:30:98:89:62
>     a=setup:actpass
>     a=mid:audio
>     a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
>     a=sendonly
>     a=rtcp-mux
>     a=crypto:0 AES_CM_128_HMAC_SHA1_32 inline:ZQQ2MjApfbPlEAmWe52FRkbKUVH4rXt5p5QnpObB
>     a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:JljeDW3NiakHNYn+bu+vje3cD77nNPytUW79J2Vz
>     a=rtpmap:111 opus/48000/2
>     a=fmtp:111 minptime=10
>     a=rtpmap:103 ISAC/16000
>     a=rtpmap:104 ISAC/32000
>     a=rtpmap:0 PCMU/8000
>     a=rtpmap:8 PCMA/8000
>     a=rtpmap:106 CN/32000
>     a=rtpmap:105 CN/16000
>     a=rtpmap:13 CN/8000
>     a=rtpmap:126 telephone-event/8000
>     a=maxptime:60
>     a=ssrc:3822531142 cname:YE/Lm1aHvDjHxwH0
>     a=ssrc:3822531142 msid:nqy2BUfoo7yn1YeWLNBeED2rqMGSb1y8Ypv3 b50d1acc-19eb-455c-8a4b-40263a2d1a9b
>     a=ssrc:3822531142 mslabel:nqy2BUfoo7yn1YeWLNBeED2rqMGSb1y8Ypv3
>     a=ssrc:3822531142 label:b50d1acc-19eb-455c-8a4b-40263a2d1a9b
>     
>     
>     
>     
>     third invite (SIMPL5 resumes the call)
>     
>     INVITE sip:1201 at 10.24.18.246:5060;transport=WS SIP/2.0
>     Via: SIP/2.0/WS df7jal23ls0d.invalid;branch=z9hG4bKGAwGP8mOJ7Kd9TuUw4Ac7T1Pc3eVTQgE;rport
>     From: "sipml_bot"<sip:sipml_bot at 10.24.18.246>;tag=W42MEkWLdFTUiIaaw8iy
>     To: <sip:1201 at 10.24.18.246>;tag=as269c229d
>     Contact: "sipml_bot"<sip:sipml_bot at df7jal23ls0d.invalid;rtcweb-breaker=no;click2call=no;transport=ws>;+g.oma.sip-im;+sip.ice;language="en,fr"
>     Call-ID: 97982d1a-15e5-cd55-9122-07b3cc20cb7d
>     CSeq: 20909 INVITE
>     Content-Type: application/sdp
>     Content-Length: 1838
>     Max-Forwards: 70
>     User-Agent: IM-client/OMA1.0 sipML5-v1.2014.01.27
>     Organization: Doubango Telecom
>     
>     v=0
>     o=- 11625312205988492 4 IN IP4 127.0.0.1
>     s=Doubango Telecom - chrome
>     t=0 0
>     a=group:BUNDLE audio
>     a=msid-semantic: WMS HpcmWGwoVidQ57oRzreLCtEzVKYzRkuGXCzS
>     m=audio 43821 RTP/SAVPF 111 103 104 0 8 106 105 13 126
>     c=IN IP4 216.207.245.1
>     a=rtcp:43821 IN IP4 216.207.245.1
>     a=candidate:474352566 1 udp 2113937151 10.24.16.82 43821 typ host generation 0
>     a=candidate:474352566 2 udp 2113937151 10.24.16.82 43821 typ host generation 0
>     a=candidate:3038348387 1 udp 1845501695 216.207.245.1 43821 typ srflx raddr 10.24.16.82 rport 43821 generation 0
>     a=candidate:3038348387 2 udp 1845501695 216.207.245.1 43821 typ srflx raddr 10.24.16.82 rport 43821 generation 0
>     a=candidate:1388705606 1 tcp 1509957375 10.24.16.82 0 typ host generation 0
>     a=candidate:1388705606 2 tcp 1509957375 10.24.16.82 0 typ host generation 0
>     a=ice-ufrag:bBLoaKGIBC2OBJK/
>     a=ice-pwd:zJD7a+9Pd54bh3WuLfQEiJRx
>     a=ice-options:google-ice
>     a=fingerprint:sha-256 3A:BA:F5:44:53:CD:99:6C:D1:32:9F:80:53:D4:B5:BA:AE:CF:98:54:71:7F:D6:CB:14:7F:D8:94:30:98:89:62
>     a=setup:actpass
>     a=mid:audio
>     a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
>     a=sendrecv
>     a=rtcp-mux
>     a=crypto:0 AES_CM_128_HMAC_SHA1_32 inline:4QyVPss3/XPkqz2AcnDioqOwLO+BLQe1T41L8POW
>     a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:dNDtygwacR+DTEppJTmZR4sjLQ/99yIDCBXbZwvJ
>     a=rtpmap:111 opus/48000/2
>     a=fmtp:111 minptime=10
>     a=rtpmap:103 ISAC/16000
>     a=rtpmap:104 ISAC/32000
>     a=rtpmap:0 PCMU/8000
>     a=rtpmap:8 PCMA/8000
>     a=rtpmap:106 CN/32000
>     a=rtpmap:105 CN/16000
>     a=rtpmap:13 CN/8000
>     a=rtpmap:126 telephone-event/8000
>     a=maxptime:60
>     a=ssrc:1421740875 cname:vyMP5//bMmdw/mZH
>     a=ssrc:1421740875 msid:HpcmWGwoVidQ57oRzreLCtEzVKYzRkuGXCzS d99b2db8-4043-427e-b5b6-6a608f186190
>     a=ssrc:1421740875 mslabel:HpcmWGwoVidQ57oRzreLCtEzVKYzRkuGXCzS
>     a=ssrc:1421740875 label:d99b2db8-4043-427e-b5b6-6a608f186190
>     
>
> 
> Matt Jordan wrote:
>     Ah ha. They do change on the third INVITE request.
>     
>     We could do this one of two ways:
>     
>     (1) Compare the candidates that were received to what we currently have. If any differ, destroy the ICE session and re-create it.
>     (2) Say damn the torpedoes, full speed ahead - and always destroy the ICE session and re-create it.
>     
>     I'd actually lean to (2), since (1) feels like an optimization you make when you're spending a lot of time destroy/re-creating things.
> 
> Joshua Colp wrote:
>     WELL... it depends if in the case of (1) the ICE negotiation was actually done again with the same candidates. If not and you destroy/re-create I'm not sure how the other side will handle the ICE negotiation.
> 
> Jonathan Rose wrote:
>     Well, the candidates aren't changing on the second INVITE which is the one that is actually causing the assertion. If nothing else, if the ice session hasn't started yet we should probably be destroying/recreating it as well.
> 
> Joshua Colp wrote:
>     Was it not started?
> 
> Jonathan Rose wrote:
>     ast_rtp_ice_start was ran on the first INVITE, but failed during pj_ice_sess_create_check_list and returned PJNATH_EICENOHOSTCAND instead of PJ_SUCCESS:
>     
>     	if (pj_ice_sess_create_check_list(rtp->ice, &ufrag, &passwd, ao2_container_count(rtp->remote_candidates), &candidates[0]) == PJ_SUCCESS) {
>     		pj_ice_sess_start_check(rtp->ice);
>     		pj_timer_heap_poll(timerheap, NULL);
>     		rtp->ice_started = 1;
>     		rtp->strict_rtp_state = STRICT_RTP_OPEN;
>     		return;
>     	}
>     
>     As I was saying before, we don't finish getting ICE started in this case. With kharwell's patch in place, we also blow out the remote candidates list in the RTP instance and that has caused the one way audio issue.
> 
> Matt Jordan wrote:
>     Right. So what needs to be determined is why pj_ice_sess_create_check_list is returning PJNATH_EICENOHOSTCAND.
>     
>     Looking at that particular code, that would occur in prune_checklist when there is no local candidate that has the same address as the SRFLX candidate's base address:
>     
>     	if (clist->checks[i].lcand->type == PJ_ICE_CAND_TYPE_SRFLX) {
>     	    /* Find the base for this candidate */
>     	    unsigned j;
>     	    for (j=0; j<ice->lcand_cnt; ++j) {
>     		pj_ice_sess_cand *host = &ice->lcand[j];
>     
>     		if (host->type != PJ_ICE_CAND_TYPE_HOST)
>     		    continue;
>     
>     		if (sockaddr_cmp(&srflx->base_addr, &host->addr) == 0) {
>     		    /* Replace this SRFLX with its BASE */
>     		    clist->checks[i].lcand = host;
>     		    break;
>     		}
>     	    }
>     
>     	    if (j==ice->lcand_cnt) {
>     		/* Host candidate not found this this srflx! */
>     		LOG4((ice->obj_name, 
>     		      "Base candidate %s:%d not found for srflx candidate %d",
>     		      pj_inet_ntoa(srflx->base_addr.ipv4.sin_addr),
>     		      pj_ntohs(srflx->base_addr.ipv4.sin_port),
>     		      GET_LCAND_ID(clist->checks[i].lcand)));
>     		return PJNATH_EICENOHOSTCAND;
>     	    }
>     	}
>     
>     When this occurs, what are your local candidates? Is there a local candidate that should match?
> 
> Jonathan Rose wrote:
>     From my debug of the local candidate list, I got the following:
>     
>     0 - host address: 10.24.18.246 (type host)
>     1 - host address: 216.207.245.1 (type srflx)
>     2 - host address: 10.24.18.246 (type host)
>     3 - host address: 216.207.245.1 (type srflx)
>     
>     And the check list addresses they are being compared to:
>     
>     0 - 10.24.18.246 (type host)
>     1 - 10.24.18.246 (type host)
>     2 - 10.24.18.246 (type host)
>     3 - 10.24.18.246 (type host)
>     4 - 216.207.245.1 (type srflx)
>     5 - 216.207.245.1 (type srflx)
>     6 - 216.207.245.1 (type srflx)
>     7 - 216.207.245.1 (type srflx)
>     8 - 10.24.18.246 (type host)
>     9 - 10.24.18.246 (type host)
>     10 - 216.207.245.1 (type srflx)
>     11 - 216.207.245.1 (type srflx)
>     
>     All of the ones that would match against the clist members are of type srflx and get ignored by:
>     		if (host->type != PJ_ICE_CAND_TYPE_HOST)
>     		    continue;
>     
>     The base_addr for each local candidate is the same as addr.

Let me ammend that data to include port numbers...

Local Candidates:
0 - host address: 10.24.18.246:3640 (type host)
1 - host address: 216.207.245.1:3896 (type srflx)
2 - host address: 10.24.18.246:3640 (type host)
3 - host address: 216.207.245.1:3896 (type srflx)

Check List:
0 - 10.24.18.246:3640 (type host)
1 - 10.24.18.246:3896 (type host)
2 - 10.24.18.246:3640 (type host)
3 - 10.24.18.246:3896 (type host)
4 - 216.207.245.1:3640 (type srflx)
5 - 216.207.245.1:3640 (type srflx)
6 - 216.207.245.1:3640 (type srflx)
7 - 216.207.245.1:3640 (type srflx)
8 - 10.24.18.246:3640 (type host)
9 - 10.24.18.246:3896 (type host)
10 - 216.207.245.1:3640 (type srflx)
11 - 216.207.245.1:3640 (type srflx)


- Jonathan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/3256/#review10937
-----------------------------------------------------------


On Feb. 24, 2014, 12:36 p.m., Jonathan Rose wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviewboard.asterisk.org/r/3256/
> -----------------------------------------------------------
> 
> (Updated Feb. 24, 2014, 12:36 p.m.)
> 
> 
> Review request for Asterisk Developers, Joshua Colp, Kevin Harwell, and Matt Jordan.
> 
> 
> Bugs: ASTERISK-22911 and ASTERISK-23213
>     https://issues.asterisk.org/jira/browse/ASTERISK-22911
>     https://issues.asterisk.org/jira/browse/ASTERISK-23213
> 
> 
> Repository: Asterisk
> 
> 
> Description
> -------
> 
> Let me start by saying this is almost certainly not the complete answer to solving these problems. This patch is simply an alternative to backing out the patch from r405234 and leaving the existing aborts in place. I've written a test to make sure the new patch (and likely a later patch which can resolve these problems with ICE more comprehensively) does not crash Asterisk and that can be view here: https://reviewboard.asterisk.org/r/3255/
> 
> In my reproduction of this regression, I noticed a few things. The first was that when starting a call with ICE from SIPML5 that Asterisk would not be able to full initialize the ICE session. Instead when creating the candidate checklist via pj_ice_sess_create_check_list, PJNATH would be unable to associate the srflx candidates with any host pairs when pruning the checklist. The SRFLX candidates would have the addresses used internally on my LAN while the addresses it would be matched up against would mirror those of my external IP (in other words, they just didn't match by address). The overall return from pj_ice_sess_create_check_list would be PJNATH_EICENOHOSTCAND and while the ICE session wouldn't get flagged as started, I still continued to receive two way audio on both ends in Asterisk 11.7
> 
> Asterisk 11.8 however would clear the candidate lists at this point and I would get one way audio instead.
> 
> The crash in 11.7 from holding was caused because upon holding the call, the SDP would contain the same list of ICE candidates, so Asterisk would attempt to start the ICE session again. This time when it entered the create check list function, the maximum ICE candidates in the session would be exhausted causing PJNATH to assert and abort (which visibly appears more or less the same as a crash). This work around patch checks to see if a create checklist call will cause that value to expand above the maximum and if it would, we abandon starting ICE back up and we clear the current candidate list on the ICE session. In my own tests this seemed to work quite well (I had two way audio, Asterisk didn't terminate suddenly, and I was able to hold and resume the call while still maintaining two way audio and music on hold for all expected ends). According to Vytis though, neither this patch nor the original patch to resolve ASTERISK-22911 by kharwell actually fixed his audio problems anyway. Kharwell's patch fixed the assertions that were occurring because the candidate list would be cleared on any error including the PJNATH_EICENOHOSTCAND which most of the people in ASTERISK-23213 were most likely experiencing as well. I draw that conclusion because when they used this patch they reported that it fixed their issues without introducing any new crashes.
> 
> At this point, I'm not entirely sure that audio actually should be allowed to work when building the ICE session check list fails like it has been in this scenario. But we need to finalize a new Asterisk 11 release soon and we can't have this apparent regression in when we do so. In the future we need to find out more comprehensively why pj_ice_sess_create_check_list is failing and how to fix that.
> 
> 
> Props to Lott Caskey for providing some improvements to the patch that I wrote.
> 
> 
> Diffs
> -----
> 
>   /branches/11/res/res_rtp_asterisk.c 408284 
> 
> Diff: https://reviewboard.asterisk.org/r/3256/diff/
> 
> 
> Testing
> -------
> 
> Tested calls and holds with a SIPML5 call to a desk phone. Confirmed two way audio and music on hold.
> Requested that the reporter and participants of ASTERISK-23213 test the patch. Everyone who tested it confirmed that it fixed their problems.
> Ran https://reviewboard.asterisk.org/r/3255/ and checked the log files to see what was happening
> 
> 
> Thanks,
> 
> Jonathan Rose
> 
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20140225/00d8250a/attachment-0001.html>


More information about the asterisk-dev mailing list