[asterisk-bugs] [JIRA] (ASTERISK-21228) Asterisk lockup - dialplan reload?

Matt Jordan (JIRA) noreply at issues.asterisk.org
Tue Mar 12 12:17:01 CDT 2013


    [ https://issues.asterisk.org/jira/browse/ASTERISK-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=204140#comment-204140 ] 

Matt Jordan commented on ASTERISK-21228:
----------------------------------------

The deadlock occurs between Thread {{0x7f4dbf437700}} and Thread {{0x7f4d981e2700}}:

{noformat}
Thread 25 (Thread 0x7f4d981e2700 (LWP 25741)):
#0  0x00007f4ddfc9189c in __lll_lock_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f4ddfc8d080 in _L_lock_903 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f4ddfc8cf19 in pthread_mutex_lock () from /lib/x86_64-linux-gnu/libpthread.so.0
#3  0x0000000000506fe1 in __ast_pthread_mutex_lock (filename=0x5df3d4 "pbx.c", lineno=11809, func=0x5e5900 "ast_rdlock_contexts", mutex_name=0x5e574f "&conlock", t=0x828220) at lock.c:248
#4  0x0000000000554c36 in ast_rdlock_contexts () at pbx.c:11809
#5  0x000000000053b909 in pbx_extension_helper (c=0x7f4d940dbf08, con=0x0, context=0x7f4d940dcd58 "default", exten=0x7f4da1d76144 "fax", priority=1, label=0x0, callerid=0x7f4d9421d2d0 "2895620092", action=E_MATCH, found=0x0, combined_find_spawn=0) at pbx.c:4600
#6  0x000000000053f12f in ast_exists_extension (c=0x7f4d940dbf08, context=0x7f4d940dcd58 "default", exten=0x7f4da1d76144 "fax", priority=1, callerid=0x7f4d9421d2d0 "2895620092") at pbx.c:5756
#7  0x00007f4da1cd9101 in sip_read (ast=0x7f4d940dbf08) at chan_sip.c:8398
#8  0x000000000047d92c in __ast_read (chan=0x7f4d940dbf08, dropaudio=0) at channel.c:4004
#9  0x000000000047f79e in ast_read (chan=0x7f4d940dbf08) at channel.c:4358
#10 0x0000000000454ad2 in autoservice_run (ign=0x0) at autoservice.c:136
#11 0x00000000005a1a97 in dummy_start (data=0x7f4d7c001980) at utils.c:1028
#12 0x00007f4ddfc8ae9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#13 0x00007f4de0db9cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#14 0x0000000000000000 in ?? ()

Thread 14 (Thread 0x7f4d7bbb7700 (LWP 22878)):
#0  0x00007f4de0d8583d in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f4de0db3774 in usleep () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00000000004552b7 in ast_autoservice_stop (chan=0x7f4d940dbf08) at autoservice.c:288
#3  0x00000000005365a0 in pbx_find_extension (chan=0x7f4d940dbf08, bypass=0x0, q=0x7f4d7bbb2520, context=0x7f4d940dcd58 "default", exten=0x7f4d940dcda8 "8776227713", priority=8, label=0x0, callerid=0x7f4d9421d2d0 "2895620092", action=E_SPAWN) at pbx.c:3347
#4  0x000000000053b970 in pbx_extension_helper (c=0x7f4d940dbf08, con=0x0, context=0x7f4d940dcd58 "default", exten=0x7f4d940dcda8 "8776227713", priority=8, label=0x0, callerid=0x7f4d9421d2d0 "2895620092", action=E_SPAWN, found=0x7f4d7bbb6c74, combined_find_spawn=1) at pbx.c:4604
#5  0x000000000053f328 in ast_spawn_extension (c=0x7f4d940dbf08, context=0x7f4d940dcd58 "default", exten=0x7f4d940dcda8 "8776227713", priority=8, callerid=0x7f4d9421d2d0 "2895620092", found=0x7f4d7bbb6c74, combined_find_spawn=1) at pbx.c:5781
#6  0x0000000000540a9d in __ast_pbx_run (c=0x7f4d940dbf08, args=0x0) at pbx.c:6256
#7  0x000000000054258c in pbx_thread (data=0x7f4d940dbf08) at pbx.c:6586
#8  0x00000000005a1a97 in dummy_start (data=0x7f4d9403ec00) at utils.c:1028
#9  0x00007f4ddfc8ae9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007f4de0db9cbd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#11 0x0000000000000000 in ?? ()
{noformat}

Thread {{0x7f4d981e2700}} calls {{ast_exists_extension}}, which tries to obtain a read lock on the contexts lock. That blocks the autoservice thread, such that the autoservice list won't be rebuilt. The fact that the autoservice list won't be rebuilt blocks Thread {{0x7f4d7bbb7700}}, which is the actual owner of the contexts lock.

What's interesting here is that both of these are supposed to be read locks, and both threads merely are holding/trying to hold a read lock on the mutex. Unfortunately, that isn't the case - the rdlock/wrlocks are actually wrappers around a straight call to ast_mutex_lock/unlock.

That, in and of itself, appears to be necessary:

{noformat}
    ........
      r280982 | tilghman | 2010-08-05 02:28:33 -0500 (Thu, 05 Aug 2010) | 8 lines
      
      Change context lock back to a mutex, because functionality depends upon the lock being recursive.
      
      (closes issue #17643)
       Reported by: zerohalo
       Patches: 
             20100726__issue17643.diff.txt uploaded by tilghman (license 14)
       Tested by: zerohalo
    ........

{noformat}

ASTERISK-16365 indicates that device state callbacks will attempt to obtain the context lock recursively.

The solution here is most likely to not hold the context lock when calling {{ast_autoservice_stop}}. How to do that, however, is tricky.

                
> Asterisk lockup - dialplan reload?
> ----------------------------------
>
>                 Key: ASTERISK-21228
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-21228
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Core/PBX
>    Affects Versions: 1.8.20.1, 11.2.1
>         Environment: Linux 3.2.0-38-generic #61-Ubuntu SMP x86_64 x86_64 x86_64 GNU/Linux
>            Reporter: Dare Awktane
>         Attachments: backtrace-threads.txt, core-show-locks.txt
>
>
> We have two asterisk machines running inbound/outbound/fax calls on both. We've developed a web application that people can use to create accounts (which flow through into asterisk as contexts). We have a cron script that calls -rx "dialplan reload" to update these contexts. Both the extconfig.conf and extensions.conf are loaded from a mysql clustered (ndb) database. Roughly 2-3 times a day each asterisk server stops taking calls and does not restart on its own. I'm not sure if all of the lockups are related to this dump. We braved a morning of bad call quality to have the debug flags set. Our dialplan shows that it has 494197 rows. A reasonably large dialplan.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.asterisk.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



More information about the asterisk-bugs mailing list