[asterisk-bugs] [JIRA] (ASTERISK-25645) res_rtp_asterisk: Lock inversion

Steve Davies (JIRA) noreply at issues.asterisk.org
Thu Dec 24 04:14:33 CST 2015


    [ https://issues.asterisk.org/jira/browse/ASTERISK-25645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=228736#comment-228736 ] 

Steve Davies commented on ASTERISK-25645:
-----------------------------------------

Hi Dade,

- I am familiar with the SEGV / 100% CPU infinite loop bug, and this is not that bug. I added locks to on_cache_timeout() in pjproject as referenced in ASTERISK-25275 to resolve this, but none of those locks are held when this issue deadlocks.
- I can cause the deadlock 4 calls in 5, not 1 call in 10,000 like the 'cached_response_list' issue, but it does take a specific call-path (below).
- If I revert the patch in ast_rtp_remote_address_set() and suffer the potential 1-second delay instead, then the issue goes away.

I will provide the backtraces of the deadlocked threads to demonstrate the issue.

I should add that our performance/load/stress test rig can make 250,000 calls and not trip over this issue, but a single call that hair-pins from JsSIP via asterisk back to the same JsSIP instance will cause this deadlock - Perhaps the fact that the same chrome JsSIP instance handles both ends of the call causes some type if synchronisation and therefore a timing-dependent deadlock.

I should also add that like yours, our JsSIP is much hacked - The biggest hack is to replace SDP 'pranswer' with 'answer' to make the early audio work more smoothly cross browsers. It is available on github at davies147/JsSIP if that helps.
{noformat}
Sequence to reproduce here:
- JsSIP initiate call
- Dialplan calls-back to same JsSIP
- Rings okay
- JsSIP answer call (seems okay)
- JsSIP put both calls on hold
...deadlock...
{noformat}
Is libpj involved in putting the call on hold? Perhaps that is the trigger. It only happens because the front-end in our app detects the contradiction in 2-calls one speaker, and tries to resolve it with a hold.

backtraces to follow.

> res_rtp_asterisk: Lock inversion
> --------------------------------
>
>                 Key: ASTERISK-25645
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-25645
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Resources/res_rtp_asterisk
>            Reporter: Joshua Colp
>
> Reported by Steve Davies on asterisk-dev:
> commit 5e6b1476a087407a052f007d326c504cfeefebe7
> ASTERISK-25614
> 2 code paths which approximate the following will cause a lock-inversion deadlock:
> approximate call orders are:
> a)
> pj_timer_heap_poll (PJ_LOCK)
> ast_rtp_on_ice_complete
> ast_rtp_instance_set_remote_address
> remote_address_set
> ast_rtp_remote_address_set
> (DTLS_LOCK)
> ...
> b)
> ast_pbx...
> app_dial
> bridge...
> read
> rtp_read
> ...
> __rtp_recvfrom
> (DTLS_LOCK)
> dtls_srtp_check_pending
> __rtp_sendto
> pj_ice_sess_send_data
> (PJ_LOCK)



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list