[asterisk-dev] Lockups in Asterisk 11

Matthew Jordan mjordan at digium.com
Mon Nov 16 09:59:50 CST 2015


On Mon, Nov 16, 2015 at 9:30 AM, Steve Davies <davies147 at gmail.com> wrote:

> Hi,
>
> I believe there is a known issue in Postgres where if multiple threads
> call pgsql_exec(), then it can cause threads to get stuck inside Postgres -
> This is why I suggested that Postgres patch, which puts Asterisk locks
> around the call to pgsql_exec().
>
> OTOH, I don't think you mentioned what DB you are using, so that might not
> apply...
>
>
So, this isn't a deadlock in the classic sense. Nor - based on the current
evidence - does this look like a bug with Asterisk.

Looking at the provided locks:

=======================================================================
=== 11.20.0
=== Currently Held Locks
=======================================================================
===
=== <pending> <lock#> (<file>): <lock type> <line num> <function> <lock
name> <lock addr> (times locked)
===
=== Thread ID: 0xb6cffb70 (db_sync_thread       started at [ 1018] db.c
astdb_init())
=== ---> Lock #0 (db.c): MUTEX 980 db_sync_thread &dblock 0x8214f60 (1)
        main/logger.c:1702 ast_bt_get_addresses() (0x813f3a6+19)
        main/lock.c:258 __ast_pthread_mutex_lock() (0x8138607+85)
        main/db.c:981 db_sync_thread()
        main/utils.c:1223 dummy_start()
        :0 start_thread()
        libc.so.6 clone() (0xb76e60d0+5E)


This is the db_sync_thread, holding the lock for the AstDB - that is, the
SQLite3 database. The db_sync_thread commits transactions, and rolls them
back in case of error. I would suspect that a gdb backtrace would show that
we are currently beginning, committing, or rolling back a transaction.
That's probably worth finding out.


=== -------------------------------------------------------------------
===
=== Thread ID: 0xb4fc3b70 (do_monitor           started at [29517]
chan_sip.c restart_monitor())
=== ---> Lock #0 (chan_sip.c): MUTEX 28923 handle_request_do &netlock
0xb6190440 (1)
        main/logger.c:1702 ast_bt_get_addresses() (0x813f3a6+19)
        main/lock.c:258 __ast_pthread_mutex_lock() (0x8138607+85)
        channels/chan_sip.c:28926 handle_request_do()
        channels/chan_sip.c:28885 sipsock_read()
        main/io.c:292 ast_io_wait() (0x8132d08+175)
        channels/chan_sip.c:29484 do_monitor()
        main/utils.c:1223 dummy_start()
        :0 start_thread()
        libc.so.6 clone() (0xb76e60d0+5E)
=== ---> Lock #1 (chan_sip.c): MUTEX 8959 sip_pvt_lock_full pvt 0x8391ec8
(1)
        main/logger.c:1702 ast_bt_get_addresses() (0x813f3a6+19)
        main/lock.c:258 __ast_pthread_mutex_lock() (0x8138607+85)
        main/astobj2.c:198 __ao2_lock() (0x80906ad+7C)
        channels/chan_sip.c:8960 sip_pvt_lock_full()
        channels/chan_sip.c:28939 handle_request_do()
        channels/chan_sip.c:28885 sipsock_read()
        main/io.c:292 ast_io_wait() (0x8132d08+175)
        channels/chan_sip.c:29484 do_monitor()
        main/utils.c:1223 dummy_start()
        :0 start_thread()
        libc.so.6 clone() (0xb76e60d0+5E)
=== ---> Lock #2 (chan_sip.c): MUTEX 17047 register_verify peer 0xb4e44ed8
(1)
        main/logger.c:1702 ast_bt_get_addresses() (0x813f3a6+19)
        main/lock.c:258 __ast_pthread_mutex_lock() (0x8138607+85)
        main/astobj2.c:198 __ao2_lock() (0x80906ad+7C)
        channels/chan_sip.c:17048 register_verify()
        channels/chan_sip.c:28477 handle_request_register()
        channels/chan_sip.c:28785 handle_incoming()
        channels/chan_sip.c:28953 handle_request_do()
        channels/chan_sip.c:28885 sipsock_read()
        main/io.c:292 ast_io_wait() (0x8132d08+175)
        channels/chan_sip.c:29484 do_monitor()
        main/utils.c:1223 dummy_start()
        :0 start_thread()
        libc.so.6 clone() (0xb76e60d0+5E)
=== ---> Waiting for Lock #3 (db.c): MUTEX 324 ast_db_put &dblock 0x8214f60
(1)
        main/logger.c:1702 ast_bt_get_addresses() (0x813f3a6+19)
        main/lock.c:258 __ast_pthread_mutex_lock() (0x8138607+85)
        main/db.c:325 ast_db_put() (0x80eeec4+D0)
        channels/chan_sip.c:16242 parse_register_contact()
        channels/chan_sip.c:17069 register_verify()
        channels/chan_sip.c:28477 handle_request_register()
        channels/chan_sip.c:28785 handle_incoming()
        channels/chan_sip.c:28953 handle_request_do()
        channels/chan_sip.c:28885 sipsock_read()
        main/io.c:292 ast_io_wait() (0x8132d08+175)
        channels/chan_sip.c:29484 do_monitor()
        main/utils.c:1223 dummy_start()
        :0 start_thread()
        libc.so.6 clone() (0xb76e60d0+5E)
=== --- ---> Locked Here: db.c line 980 (db_sync_thread)
=== -------------------------------------------------------------------

This sequence of locks shows that chan_sip is currently updating the SIP
registry stored in the AstDB. Since we're currently in the middle of doing
something with a transaction, it is blocked.


=== Thread ID: 0xb4485b70 (handle_tcptls_connection started at [  745]
tcptls.c ast_tcptls_server_root())
=== ---> Waiting for Lock #0 (chan_sip.c): MUTEX 28923 handle_request_do
&netlock 0xb6190440 (1)
        main/logger.c:1702 ast_bt_get_addresses() (0x813f3a6+19)
        main/lock.c:258 __ast_pthread_mutex_lock() (0x8138607+85)
        channels/chan_sip.c:28926 handle_request_do()
        channels/chan_sip.c:3045 _sip_tcp_helper_thread()
        channels/chan_sip.c:2572 sip_tcp_worker_fn()
        main/tcptls.c:694 handle_tcptls_connection()
        main/utils.c:1223 dummy_start()
        :0 start_thread()
        libc.so.6 clone() (0xb76e60d0+5E)
=== --- ---> Locked Here: chan_sip.c line 28923 (handle_request_do)
=== -------------------------------------------------------------------
===
=======================================================================

And this shows that the TCP/TLS thread is waiting on a lock held by the
monitor thread; most likely, this is the SIP pvt lock currently held by the
thread.

Things to investigate:
(1) Get a gdb backtrace when this occurs, so you can find out exactly what
the AstDB synchronization thread is doing that is blocking for a long
period of time.
(2) Find out *exactly* what is occurring on your system. If the AstDB
synchronization thread is taking a noticeable amount of time to do work,
you almost certainly have either a problem in your AstDB; have other I/O
problems on the system; or have something occurring here that is outside
the control of Asterisk.

Either way, nothing above makes me think there is a bug in Asterisk.

-- 
Matthew Jordan
Digium, Inc. | Director of Technology
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at: http://digium.com & http://asterisk.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-dev/attachments/20151116/afb6e4fe/attachment-0001.html>


More information about the asterisk-dev mailing list