[asterisk-bugs] [JIRA] (ASTERISK-25645) res_rtp_asterisk: Lock inversion

Steve Davies (JIRA) noreply at issues.asterisk.org
Fri Dec 25 04:58:33 CST 2015


    [ https://issues.asterisk.org/jira/browse/ASTERISK-25645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=228748#comment-228748 ] 

Steve Davies commented on ASTERISK-25645:
-----------------------------------------

Dade,

Good feedback - I had not twigged that you were non-Digium because of the way you were immediately linked. I suspect we are actually in similar positions stuck between a rock and a hard place; 1) You've done some excellent work to produce your patch(es) in this area, which I am personally benefiting from, and I come along and suggest it is broken. 2) I find some apparent brokenness which can be rectified by removing part of your patch, but it is not easily reproduced elsewhere... Nasty on both counts.

>  I suggest submitting them back before opening issues without first backtracking to the release, without any customizations

Apart from the pjproject lock change, I think I am running nearly vanilla 11.21.0-rc1 (Debian patches excepted). I provided the reduced backtrace as I thought it would actually be more clean/useful - I will rectify that in the new year. I'll un-patch pjproject and compile with better debug symbols too.

> The lock acquired in pj_timer_heap_poll is not the ice->lock that's in attempt to be acquired in pj_ice_sess_send_data

Again, I'm pretty sure I saw the ice group lock being grabbed on both occasions - I will double check and provide better references after the holidays.

>> Is the attached patch 'experimental_anti_deadlock' utterly stupid?
> It reverts some locking I added in

The intention is NOT to revert your locking, rather to use it more broadly and then to briefly unlock where there is a possibility that we may call in to libpj.

> I also wouldn't add the lock in to dtls_srtp_check_pending - at least not where you added it, since you added it after the BIO_READ.

That is an UN-lock, which is re-locked 2 lines later. This is the temporary release of a lock to avoid the lock inversion that I can see.

>> I am familiar with the SEGV / 100% CPU infinite loop bug, and this is not that bug.
> Funny, because my company held our information about that privately until I posted the update to it at the same time as replying to you yesterday, and I feel like "familiarity" would be difficult to establish over a 24 hour period.

I am familiar with that issue because I spent the last 2 to 3 weeks looking at that bug myself - Sounds like your company and mine did the same thing :) I have been trying to establish how to submit bug reports/patches to the pjsip project - Not as easy as with Digium! My solution was to put a brief lock on the stun object in the 'on stun cache timeout' codepath - All other references to the stun cache list hold the same lock, so it seemed sensible

> I mentioned that locks being added on to on_cache_timeout were how the more up to date pjproject's have solved the problem

I found no evidence of it being fixed, but it did not check their SVN repo, so it may be fixed there. It is not fixed in any released versions AFAIK. I will probably post it on the pjsip mailing list in the new year unless there is evidence of a fix elsewhere.

> There is no "group lock" on Asterisk-11's embedded pjproject

Ok this I need to look into, perhaps I have caused this by using pjproject 2.4.5 with Asterisk 11 - Debian's asterisk builds dynamically link the external pjproject libs and I have been taking that very much for granted... My bad!

(Also, let us strike the references to ASTERISK-25275 from the record, I think it has simply caused some mutual confusion, particularly as the thread was re-opened and I assumed that was the post you meant.)

So... If the static pjproject in asterisk-11 does not use the same locking as pjproject 2.4.5 in the timer poll, then your patch is probably perfectly safe for Asterisk 11. Perhaps Asterisk-13 uses pjproject sufficiently differently to not cause the issue too. In that respect your original complaint about not raising bugs against customised code stands... Would you rather not be sure though ;-)

I'm off work until Jan 4th, and will delve further into this then.

(Restricted to Users role)
> res_rtp_asterisk: Lock inversion
> --------------------------------
>
>                 Key: ASTERISK-25645
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-25645
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Resources/res_rtp_asterisk
>            Reporter: Joshua Colp
>         Attachments: deadlocked_threads.txt, experimental_anti_deadlock
>
>
> Reported by Steve Davies on asterisk-dev:
> commit 5e6b1476a087407a052f007d326c504cfeefebe7
> ASTERISK-25614
> 2 code paths which approximate the following will cause a lock-inversion deadlock:
> approximate call orders are:
> a)
> pj_timer_heap_poll (PJ_LOCK)
> ast_rtp_on_ice_complete
> ast_rtp_instance_set_remote_address
> remote_address_set
> ast_rtp_remote_address_set
> (DTLS_LOCK)
> ...
> b)
> ast_pbx...
> app_dial
> bridge...
> read
> rtp_read
> ...
> __rtp_recvfrom
> (DTLS_LOCK)
> dtls_srtp_check_pending
> __rtp_sendto
> pj_ice_sess_send_data
> (PJ_LOCK)



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list