[asterisk-bugs] [JIRA] (ASTERISK-21657) asterisk locks up after running traffic

Matt Jordan (JIRA) noreply at issues.asterisk.org
Fri Apr 26 09:13:38 CDT 2013


    [ https://issues.asterisk.org/jira/browse/ASTERISK-21657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=205889#comment-205889 ] 

Matt Jordan edited comment on ASTERISK-21657 at 4/26/13 9:12 AM:
-----------------------------------------------------------------

we have done some investigating and seems like the lockup is related  to  a SUBSCRIBE receive for a channel, 
this subscription is due to the fact  we are using BLF (hints ) file.  the following analysis is from  our partner "Asteria", the threads id may differ but gives a pretty good ide of the problem. Hope this helps:

Thread 11619 (LWP 23061) in handle_statechange locks three locks:
{noformat}
                ast_rdlock_contexts();
                ao2_lock(hints);
                ao2_lock(hint);
{noformat}

Then calls into chan_sip
{noformat}
                /* For extension callbacks */
                AST_LIST_TRAVERSE(&hint->callbacks, cblist, entry) {
                        cblist->callback(hint->exten->parent->name, hint->exten->exten, state, cblist->data);
                }
{noformat}

Then tries to lock one more lock:
{noformat}
static int cb_extensionstate(char *context, char* exten, int state, void *data)
{
        struct sip_pvt *p = data;

        sip_pvt_lock(p);
{noformat}

But deadlocks because p is already locked by 23140.

At the same exact time, Thread 11583 (LWP 23140) gets an incoming SUBSCRIBE to 2008, locks these locks:
{noformat}
chan_sip.c:24138
        /* Process request, with netlock held, and with usual deadlock avoidance */
        for (lockretry = 10; lockretry > 0; lockretry--) {
                ast_mutex_lock(&netlock);
{noformat}
chan_sip.c:24152
{noformat}
                if (!p->owner || !ast_channel_trylock(p->owner))
                        break;  /* locking succeeded */
{noformat}
And attempts to get the "hint":
{noformat}
       /* If this is a subscription we actually just need to see if a hint exists for the extension */
        if (req->method == SIP_SUBSCRIBE) {
                char hint[AST_MAX_EXTENSION];
                int which = 0;
                if (ast_get_hint(hint, sizeof(hint), NULL, 0, NULL, p->context, uri) ||
                    (ast_get_hint(hint, sizeof(hint), NULL, 0, NULL, p->context, decoded_uri) && (which = 1))) {
 {noformat}

Which calls this short function:
{noformat}
static struct ast_exten *ast_hint_extension(struct ast_channel *c, const char *context, const char *exten)
{
        struct ast_exten *e;
        ast_rdlock_contexts();
        e = ast_hint_extension_nolock(c, context, exten);
        ast_unlock_contexts();
        return e;
}
{noformat}

Where it deadlocks on the call to ast_rdlock_contexts() because 23061 is already holding that lock.

Despite the name, ast_rdlock_contexts is just a mutex, there cannot be multiple readers.

                
      was (Author: alejandro orellana):
    we have done some investigating and seems like the lockup is related  to  a SUBSCRIBE receive for a channel, 
this subscription is due to the fact  we are using BLF (hints ) file.  the following analysis is from  our partner "Asteria", the threads id may differ but gives a pretty good ide of the problem. Hope this helps:
Thread 11619 (LWP 23061) in handle_statechange locks three locks:
                ast_rdlock_contexts();
                ao2_lock(hints);
                ao2_lock(hint);

Then calls into chan_sip
                /* For extension callbacks */
                AST_LIST_TRAVERSE(&hint->callbacks, cblist, entry) {
                        cblist->callback(hint->exten->parent->name, hint->exten->exten, state, cblist->data);
                }

Then tries to lock one more lock:
static int cb_extensionstate(char *context, char* exten, int state, void *data)
{
        struct sip_pvt *p = data;

        sip_pvt_lock(p);

But deadlocks because p is already locked by 23140.

At the same exact time, Thread 11583 (LWP 23140) gets an incoming SUBSCRIBE to 2008, locks these locks:
chan_sip.c:24138
        /* Process request, with netlock held, and with usual deadlock avoidance */
        for (lockretry = 10; lockretry > 0; lockretry--) {
                ast_mutex_lock(&netlock);
chan_sip.c:24152
                if (!p->owner || !ast_channel_trylock(p->owner))
                        break;  /* locking succeeded */
And attempts to get the "hint":
       /* If this is a subscription we actually just need to see if a hint exists for the extension */
        if (req->method == SIP_SUBSCRIBE) {
                char hint[AST_MAX_EXTENSION];
                int which = 0;
                if (ast_get_hint(hint, sizeof(hint), NULL, 0, NULL, p->context, uri) ||
                    (ast_get_hint(hint, sizeof(hint), NULL, 0, NULL, p->context, decoded_uri) && (which = 1))) {
 

Which calls this short function:
static struct ast_exten *ast_hint_extension(struct ast_channel *c, const char *context, const char *exten)
{
        struct ast_exten *e;
        ast_rdlock_contexts();
        e = ast_hint_extension_nolock(c, context, exten);
        ast_unlock_contexts();
        return e;
}

Where it deadlocks on the call to ast_rdlock_contexts() because 23061 is already holding that lock.

Despite the name, ast_rdlock_contexts is just a mutex, there cannot be multiple readers.


                  
> asterisk locks up after running traffic 
> ----------------------------------------
>
>                 Key: ASTERISK-21657
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-21657
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Applications/SLA, Channels/chan_sip/General
>    Affects Versions: 1.8.21.0
>         Environment: linux ubuntu   , asterisk-1.8.2.3
>            Reporter: Alejandro Orellana
>            Assignee: Rusty Newton
>         Attachments: asteriskLocks.txt, file
>
>
> after running traffic asterisk locks up
> it takes between 2-5 hours
> i am using SLA feature. the SLA has 12 devices associated with it.
> using 10 with tcp and 2 with udp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.asterisk.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



More information about the asterisk-bugs mailing list