[asterisk-bugs] [JIRA] (ASTERISK-25213) Possibility of deadlock in chan_sip INVITE early Replace code

Walter Doekes (JIRA) noreply at issues.asterisk.org
Tue Jun 30 09:56:33 CDT 2015


Walter Doekes created ASTERISK-25213:
----------------------------------------

             Summary: Possibility of deadlock in chan_sip INVITE early Replace code
                 Key: ASTERISK-25213
                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-25213
             Project: Asterisk
          Issue Type: Bug
      Security Level: None
    Affects Versions: 11.19.0
            Reporter: Walter Doekes


It seems there is a lock order problem with the following code in chan_sip:

{code}
static int handle_request_invite(struct sip_pvt *p, struct sip_request *req, struct ast_sockaddr *addr, uint32_t seqno, int *recount, const char *e, int *nounlock)
...
                /* This locks both refer_call pvt and refer_call pvt's owner!!!*/
                if (!error && ast_strlen_zero(pickup.exten) && (p->refer->refer_call = get_sip_pvt_byid_locked(replace_id, totag, fromtag)) == NULL) {
                        ast_log(LOG_NOTICE, "Supervised transfer attempted to replace non-existent call id (%s)!\n", replace_id);
                        transmit_response_reliable(p, "481 Call Leg Does Not Exist (Replaces)", req);
                        error = 1;
                } else {
                        refer_locked = 1;
                }
...
                if (!replace_id && (gotdest != SIP_GET_DEST_EXTEN_FOUND)) {     /* No matching extension found */
...
                } else {
...
                        /* First invitation - create the channel.  Allocation
                         * failures are handled below. */

                        c = sip_new(p, AST_STATE_DOWN, S_OR(p->peername, NULL), NULL, p->logger_callid);
{code}

The above code:
- locks a channel
- then calls sip_new, which eventually locks the channel list (through __ast_channel_alloc)

This order is wrong, because most of the other code does the inverse:
- locks the channel list
- locks a channel

This is a problem when:
- one thread piece of code iterates over the channel list; like say a MASTER_CHANNEL() function call
- another thread enters the shown code and obtains the channel lock before thread one has iterated over the locked channel

Now the two threads start waiting on each other.

Reproducing, see the attached files, starting with {{callpickup}}:
{noformat}
$ ls /etc/asterisk/ -1
asterisk.conf
callpickup-alice.xml
callpickup-bob-waits-and-tells-charlie.xml
callpickup-charlie.xml
callpickup.sh
extensions.conf
logger.conf
modules.conf
sip.conf
{noformat}

Run like this:
{noformat}
shell1# ./callpickup.sh
shell2# /usr/local/bin/sipp -s devil -ap devil2 -i 127.0.0.1 -nostdin \
  -nd -default_behaviors bye -sf /etc/asterisk/callpickup-alice.xml \
  -p 5073 -key tel devil 127.0.0.1
shell3# ulimit -n 2048; ulimit -c unlimited; asterisk -c
{noformat}

With DEBUG_THREADS, DONT_OPTIMIZE and DEADLOCK_DETECT, on my system this takes less than a minute to trigger a deadlock (which may result in a crash, see ASTERISK-25212).
Sometimes it doesn't crash, in that case, it shows something like:

{noformat}
# grep locked /tmp/deadlock-detect.txt 
[Jun 30 16:34:43] ERROR[30200][C-00000226]: lock.c:295 __ast_pthread_mutex_lock: chan_sip.c line 8976 (sip_pvt_lock_full): 'c[x]' was locked here.
[Jun 30 16:34:43] ERROR[30195][C-00000214]: lock.c:295 __ast_pthread_mutex_lock: astobj2.c line 1095 (internal_ao2_callback): 'c' was locked here.
[Jun 30 16:34:43] ERROR[30201][C-0000021d]: lock.c:295 __ast_pthread_mutex_lock: devicestate.c line 502 (ast_devstate_changed_literal): '&(&state_changes)->lock' was locked here.
[Jun 30 16:34:43] ERROR[30190][C-0000021d]: lock.c:295 __ast_pthread_mutex_lock: devicestate.c line 502 (ast_devstate_changed_literal): '&(&state_changes)->lock' was locked here.
[Jun 30 16:34:43] ERROR[30199][C-0000021a]: lock.c:295 __ast_pthread_mutex_lock: astobj2.c line 1095 (internal_ao2_callback): 'c' was locked here.
[Jun 30 16:34:43] ERROR[30197][C-00000217]: lock.c:295 __ast_pthread_mutex_lock: astobj2.c line 1095 (internal_ao2_callback): 'c' was locked here.
[Jun 30 16:34:43] ERROR[30193][C-00000211]: lock.c:295 __ast_pthread_mutex_lock: astobj2.c line 1095 (internal_ao2_callback): 'c' was locked here.
[Jun 30 16:34:43] ERROR[30191][C-0000020e]: lock.c:295 __ast_pthread_mutex_lock: chan_local.c line 455 (local_queue_frame): 'chan' was locked here.
[Jun 30 16:34:45] ERROR[29779]: lock.c:295 __ast_pthread_mutex_lock: astobj2.c line 1095 (internal_ao2_callback): 'iter->c' was locked here.
{noformat}

I guess the refer_call should be unlocked before we enter sip_new; and then re-acquired.

Cheers,
Walter Doekes
OSSO B.V.




--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list