[asterisk-bugs] [JIRA] (ASTERISK-21406) [patch] chan_sip deadlock on monlock between unload_module and do_monitor

Corey Farrell (JIRA) noreply at issues.asterisk.org
Wed Jul 31 02:59:03 CDT 2013


     [ https://issues.asterisk.org/jira/browse/ASTERISK-21406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Corey Farrell updated ASTERISK-21406:
-------------------------------------

    Attachment: chan_sip-unload-deadlock-backtrace.txt

gdb backtrace is from 1.8 branch.

thread 5 is do_monitor() waiting for monlock.
thread 16 is attempting to unload chan_sip.  it has monlock and is waiting for do_monitor() to exit (pthread_join)

Built without thread debugging, run within valgrind.  I've been unable to reproduce this issue with thread debugging enabled.  Thread debugging / deadlock detection adds a bunch of code to ast_mutex_lock, one of the calls must react to pthread_cancel.
                
> [patch] chan_sip deadlock on monlock between unload_module and do_monitor
> -------------------------------------------------------------------------
>
>                 Key: ASTERISK-21406
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-21406
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: Channels/chan_sip/General
>    Affects Versions: 11.4.0
>         Environment: Ubuntu/quantal, eglibc-2.15-0ubuntu20
>            Reporter: Corey Farrell
>         Attachments: chan_sip-unload-deadlock-backtrace.txt, chan_sip-unload-testfix.patch
>
>
> unload_module cancels/joins the monitor thread while holding monlock.  If do_monitor attempts to lock monlock while unload_module already has it, they deadlock.  do_monitor waits for monlock while unload_module waits for do_monitor to exit.
> I've experienced this issue a couple of times in production when attempting to shutting down.  I found the cause while running valgrind tests.  I believe valgrind slowed things down so much it caused the deadlock to occur somewhat reliably.  I could not replicate the issue with lock debugging enabled.  I added ast_log messages to unload_module, found that they stopped while monlock was held.  The valgrind testing was done with 'make samples', no changes to /etc/asterisk.  I tried attaching gdb once the lock occured but it could not find symbols (probably because of valgrind).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.asterisk.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



More information about the asterisk-bugs mailing list