[asterisk-bugs] [JIRA] (ASTERISK-25213) Possibility of deadlock in chan_sip INVITE early Replace code
Rusty Newton (JIRA)
noreply at issues.asterisk.org
Tue Jun 30 15:36:33 CDT 2015
[ https://issues.asterisk.org/jira/browse/ASTERISK-25213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=226728#comment-226728 ]
Rusty Newton commented on ASTERISK-25213:
-----------------------------------------
Thanks for being so thorough. I was able to reproduce using the scripts and configs you provided.
> Possibility of deadlock in chan_sip INVITE early Replace code
> -------------------------------------------------------------
>
> Key: ASTERISK-25213
> URL: https://issues.asterisk.org/jira/browse/ASTERISK-25213
> Project: Asterisk
> Issue Type: Bug
> Security Level: None
> Components: Channels/chan_sip/General
> Affects Versions: 11.18.0, 11.19.0
> Reporter: Walter Doekes
> Attachments: callpickup-alice.xml, callpickup-bob-waits-and-tells-charlie.xml, callpickup-charlie.xml, callpickup-extensions.conf, callpickup.sh, callpickup-sip.conf, deadlock-detect.txt
>
>
> It seems there is a lock order problem with the following code in chan_sip:
> {code}
> static int handle_request_invite(struct sip_pvt *p, struct sip_request *req, struct ast_sockaddr *addr, uint32_t seqno, int *recount, const char *e, int *nounlock)
> ...
> /* This locks both refer_call pvt and refer_call pvt's owner!!!*/
> if (!error && ast_strlen_zero(pickup.exten) && (p->refer->refer_call = get_sip_pvt_byid_locked(replace_id, totag, fromtag)) == NULL) {
> ast_log(LOG_NOTICE, "Supervised transfer attempted to replace non-existent call id (%s)!\n", replace_id);
> transmit_response_reliable(p, "481 Call Leg Does Not Exist (Replaces)", req);
> error = 1;
> } else {
> refer_locked = 1;
> }
> ...
> if (!replace_id && (gotdest != SIP_GET_DEST_EXTEN_FOUND)) { /* No matching extension found */
> ...
> } else {
> ...
> /* First invitation - create the channel. Allocation
> * failures are handled below. */
> c = sip_new(p, AST_STATE_DOWN, S_OR(p->peername, NULL), NULL, p->logger_callid);
> {code}
> The above code:
> - locks a channel
> - then calls sip_new, which eventually locks the channel list (through __ast_channel_alloc)
> This order is wrong, because most of the other code does the inverse:
> - locks the channel list
> - locks a channel
> This is a problem when:
> - one thread piece of code iterates over the channel list; like say a MASTER_CHANNEL() function call
> - another thread enters the shown code and obtains the channel lock before thread one has iterated over the locked channel
> Now the two threads start waiting on each other.
> Reproducing, see the attached files, starting with {{callpickup}}:
> {noformat}
> $ ls /etc/asterisk/ -1
> asterisk.conf
> callpickup-alice.xml
> callpickup-bob-waits-and-tells-charlie.xml
> callpickup-charlie.xml
> callpickup.sh
> extensions.conf
> logger.conf
> modules.conf
> sip.conf
> {noformat}
> Run like this:
> {noformat}
> shell1# ./callpickup.sh
> shell2# /usr/local/bin/sipp -s devil -ap devil2 -i 127.0.0.1 -nostdin \
> -nd -default_behaviors bye -sf /etc/asterisk/callpickup-alice.xml \
> -p 5073 -key tel devil 127.0.0.1
> shell3# ulimit -n 2048; ulimit -c unlimited; asterisk -c
> {noformat}
> With DEBUG_THREADS, DONT_OPTIMIZE and DETECT_DEADLOCKS, on my system this takes less than a minute to trigger a deadlock (which may result in a crash, see ASTERISK-25212).
> Sometimes it doesn't crash, in that case, it shows something like:
> {noformat}
> # grep locked /tmp/deadlock-detect.txt
> [Jun 30 16:34:43] ERROR[30200][C-00000226]: lock.c:295 __ast_pthread_mutex_lock: chan_sip.c line 8976 (sip_pvt_lock_full): 'c[x]' was locked here.
> [Jun 30 16:34:43] ERROR[30195][C-00000214]: lock.c:295 __ast_pthread_mutex_lock: astobj2.c line 1095 (internal_ao2_callback): 'c' was locked here.
> [Jun 30 16:34:43] ERROR[30201][C-0000021d]: lock.c:295 __ast_pthread_mutex_lock: devicestate.c line 502 (ast_devstate_changed_literal): '&(&state_changes)->lock' was locked here.
> [Jun 30 16:34:43] ERROR[30190][C-0000021d]: lock.c:295 __ast_pthread_mutex_lock: devicestate.c line 502 (ast_devstate_changed_literal): '&(&state_changes)->lock' was locked here.
> [Jun 30 16:34:43] ERROR[30199][C-0000021a]: lock.c:295 __ast_pthread_mutex_lock: astobj2.c line 1095 (internal_ao2_callback): 'c' was locked here.
> [Jun 30 16:34:43] ERROR[30197][C-00000217]: lock.c:295 __ast_pthread_mutex_lock: astobj2.c line 1095 (internal_ao2_callback): 'c' was locked here.
> [Jun 30 16:34:43] ERROR[30193][C-00000211]: lock.c:295 __ast_pthread_mutex_lock: astobj2.c line 1095 (internal_ao2_callback): 'c' was locked here.
> [Jun 30 16:34:43] ERROR[30191][C-0000020e]: lock.c:295 __ast_pthread_mutex_lock: chan_local.c line 455 (local_queue_frame): 'chan' was locked here.
> [Jun 30 16:34:45] ERROR[29779]: lock.c:295 __ast_pthread_mutex_lock: astobj2.c line 1095 (internal_ao2_callback): 'iter->c' was locked here.
> {noformat}
> I guess the refer_call should be unlocked before we enter sip_new; and then re-acquired.
> Cheers,
> Walter Doekes
> OSSO B.V.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
More information about the asterisk-bugs
mailing list