[asterisk-bugs] [JIRA] (ASTERISK-25213) Possibility of deadlock in chan_sip INVITE early Replace code

Asterisk Team (JIRA) noreply at issues.asterisk.org
Tue Jun 30 09:56:33 CDT 2015


    [ https://issues.asterisk.org/jira/browse/ASTERISK-25213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=226713#comment-226713 ] 

Asterisk Team commented on ASTERISK-25213:
------------------------------------------

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

> Possibility of deadlock in chan_sip INVITE early Replace code
> -------------------------------------------------------------
>
>                 Key: ASTERISK-25213
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-25213
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>    Affects Versions: 11.19.0
>            Reporter: Walter Doekes
>
> It seems there is a lock order problem with the following code in chan_sip:
> {code}
> static int handle_request_invite(struct sip_pvt *p, struct sip_request *req, struct ast_sockaddr *addr, uint32_t seqno, int *recount, const char *e, int *nounlock)
> ...
>                 /* This locks both refer_call pvt and refer_call pvt's owner!!!*/
>                 if (!error && ast_strlen_zero(pickup.exten) && (p->refer->refer_call = get_sip_pvt_byid_locked(replace_id, totag, fromtag)) == NULL) {
>                         ast_log(LOG_NOTICE, "Supervised transfer attempted to replace non-existent call id (%s)!\n", replace_id);
>                         transmit_response_reliable(p, "481 Call Leg Does Not Exist (Replaces)", req);
>                         error = 1;
>                 } else {
>                         refer_locked = 1;
>                 }
> ...
>                 if (!replace_id && (gotdest != SIP_GET_DEST_EXTEN_FOUND)) {     /* No matching extension found */
> ...
>                 } else {
> ...
>                         /* First invitation - create the channel.  Allocation
>                          * failures are handled below. */
>                         c = sip_new(p, AST_STATE_DOWN, S_OR(p->peername, NULL), NULL, p->logger_callid);
> {code}
> The above code:
> - locks a channel
> - then calls sip_new, which eventually locks the channel list (through __ast_channel_alloc)
> This order is wrong, because most of the other code does the inverse:
> - locks the channel list
> - locks a channel
> This is a problem when:
> - one thread piece of code iterates over the channel list; like say a MASTER_CHANNEL() function call
> - another thread enters the shown code and obtains the channel lock before thread one has iterated over the locked channel
> Now the two threads start waiting on each other.
> Reproducing, see the attached files, starting with {{callpickup}}:
> {noformat}
> $ ls /etc/asterisk/ -1
> asterisk.conf
> callpickup-alice.xml
> callpickup-bob-waits-and-tells-charlie.xml
> callpickup-charlie.xml
> callpickup.sh
> extensions.conf
> logger.conf
> modules.conf
> sip.conf
> {noformat}
> Run like this:
> {noformat}
> shell1# ./callpickup.sh
> shell2# /usr/local/bin/sipp -s devil -ap devil2 -i 127.0.0.1 -nostdin \
>   -nd -default_behaviors bye -sf /etc/asterisk/callpickup-alice.xml \
>   -p 5073 -key tel devil 127.0.0.1
> shell3# ulimit -n 2048; ulimit -c unlimited; asterisk -c
> {noformat}
> With DEBUG_THREADS, DONT_OPTIMIZE and DEADLOCK_DETECT, on my system this takes less than a minute to trigger a deadlock (which may result in a crash, see ASTERISK-25212).
> Sometimes it doesn't crash, in that case, it shows something like:
> {noformat}
> # grep locked /tmp/deadlock-detect.txt 
> [Jun 30 16:34:43] ERROR[30200][C-00000226]: lock.c:295 __ast_pthread_mutex_lock: chan_sip.c line 8976 (sip_pvt_lock_full): 'c[x]' was locked here.
> [Jun 30 16:34:43] ERROR[30195][C-00000214]: lock.c:295 __ast_pthread_mutex_lock: astobj2.c line 1095 (internal_ao2_callback): 'c' was locked here.
> [Jun 30 16:34:43] ERROR[30201][C-0000021d]: lock.c:295 __ast_pthread_mutex_lock: devicestate.c line 502 (ast_devstate_changed_literal): '&(&state_changes)->lock' was locked here.
> [Jun 30 16:34:43] ERROR[30190][C-0000021d]: lock.c:295 __ast_pthread_mutex_lock: devicestate.c line 502 (ast_devstate_changed_literal): '&(&state_changes)->lock' was locked here.
> [Jun 30 16:34:43] ERROR[30199][C-0000021a]: lock.c:295 __ast_pthread_mutex_lock: astobj2.c line 1095 (internal_ao2_callback): 'c' was locked here.
> [Jun 30 16:34:43] ERROR[30197][C-00000217]: lock.c:295 __ast_pthread_mutex_lock: astobj2.c line 1095 (internal_ao2_callback): 'c' was locked here.
> [Jun 30 16:34:43] ERROR[30193][C-00000211]: lock.c:295 __ast_pthread_mutex_lock: astobj2.c line 1095 (internal_ao2_callback): 'c' was locked here.
> [Jun 30 16:34:43] ERROR[30191][C-0000020e]: lock.c:295 __ast_pthread_mutex_lock: chan_local.c line 455 (local_queue_frame): 'chan' was locked here.
> [Jun 30 16:34:45] ERROR[29779]: lock.c:295 __ast_pthread_mutex_lock: astobj2.c line 1095 (internal_ao2_callback): 'iter->c' was locked here.
> {noformat}
> I guess the refer_call should be unlocked before we enter sip_new; and then re-acquired.
> Cheers,
> Walter Doekes
> OSSO B.V.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list