[asterisk-bugs] [JIRA] (ASTERISK-24287) Race conditons and other problems in res_config_pgsql
Steve Davies (JIRA)
noreply at issues.asterisk.org
Thu Jun 11 08:58:32 CDT 2015
[ https://issues.asterisk.org/jira/browse/ASTERISK-24287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=226479#comment-226479 ]
Steve Davies commented on ASTERISK-24287:
-----------------------------------------
I am using Asterisk 11.18.0 and have also spent some time trying to find a deadlock in res_config_pgsql. From the debug I've managed to grab, it appears that the call to PGexec() blocks on a poll() in libpq.so forever, and is in fact stuck in the kernel somewhere! As a result of this the thread itself grabs many locks (below), and never lets them go, causing chaos.
Postgres Bug#6342 from back in December 2011 seems to refer to this issue, but no resolution was ever found.
My current solution is to not use realtime :(
Steve
=== Thread ID: 0xb2ec3b70 (handle_tcptls_connection started at [ 747] tcptls.c ast_tcptls_server_root())
=== ---> Lock #0 (chan_sip.c): MUTEX 29129 handle_request_do &netlock 0xb6557a40 (1)
main/logger.c:1701 ast_bt_get_addresses() (0x81506d2+19)
main/lock.c:258 __ast_pthread_mutex_lock() (0x8148a96+94)
chan_sip.so <unknown>()
chan_sip.so <unknown>()
res_http_websocket.so <unknown>()
main/http.c:754 handle_uri()
main/http.c:991 httpd_helper_thread()
main/tcptls.c:696 handle_tcptls_connection()
main/utils.c:1223 dummy_start()
:0 start_thread()
libc.so.6 clone() (0xb7701010+5E)
=== ---> Lock #1 (chan_sip.c): MUTEX 9083 sip_pvt_lock_full pvt 0xb2d60f98 (1)
main/logger.c:1701 ast_bt_get_addresses() (0x81506d2+19)
main/lock.c:258 __ast_pthread_mutex_lock() (0x8148a96+94)
main/astobj2.c:198 __ao2_lock() (0x8094df4+7C)
chan_sip.so <unknown>()
chan_sip.so <unknown>()
chan_sip.so <unknown>()
res_http_websocket.so <unknown>()
main/http.c:754 handle_uri()
main/http.c:991 httpd_helper_thread()
main/tcptls.c:696 handle_tcptls_connection()
main/utils.c:1223 dummy_start()
:0 start_thread()
libc.so.6 clone() (0xb7701010+5E)
=== ---> Lock #2 (chan_sip.c): MUTEX 17308 register_verify peer 0xad1af50 (1)
main/logger.c:1701 ast_bt_get_addresses() (0x81506d2+19)
main/lock.c:258 __ast_pthread_mutex_lock() (0x8148a96+94)
main/astobj2.c:198 __ao2_lock() (0x8094df4+7C)
chan_sip.so <unknown>()
chan_sip.so <unknown>()
chan_sip.so <unknown>()
chan_sip.so <unknown>()
chan_sip.so <unknown>()
res_http_websocket.so <unknown>()
main/http.c:754 handle_uri()
main/http.c:991 httpd_helper_thread()
main/tcptls.c:696 handle_tcptls_connection()
main/utils.c:1223 dummy_start()
:0 start_thread()
libc.so.6 clone() (0xb7701010+5E)
=== ---> Lock #3 (res_config_pgsql.c): MUTEX 1126 config_pgsql &pgsql_lock 0xb6e57160 (1)
main/logger.c:1701 ast_bt_get_addresses() (0x81506d2+19)
main/lock.c:258 __ast_pthread_mutex_lock() (0x8148a96+94)
res_config_pgsql.so <unknown>()
main/config.c:2693 ast_config_internal_load() (0x80ed39c+1FF)
main/config.c:2714 ast_config_load2() (0x80ed62c+43)
chan_sip.so <unknown>()
chan_sip.so <unknown>()
chan_sip.so <unknown>()
chan_sip.so <unknown>()
chan_sip.so <unknown>()
chan_sip.so <unknown>()
res_http_websocket.so <unknown>()
main/http.c:754 handle_uri()
main/http.c:991 httpd_helper_thread()
main/tcptls.c:696 handle_tcptls_connection()
main/utils.c:1223 dummy_start()
:0 start_thread()
libc.so.6 clone() (0xb7701010+5E)
=== -------------------------------------------------------------------
Thread back-trace (truncated)
#0 0xb76f380c in poll () from /lib/i386-linux-gnu/libc.so.6
#1 0xb6e307d6 in ?? () from /usr/lib/libpq.so.5
#2 0xb6e308cb in ?? () from /usr/lib/libpq.so.5
#3 0xb6e30953 in ?? () from /usr/lib/libpq.so.5
#4 0xb6e2e7a2 in PQgetResult () from /usr/lib/libpq.so.5
#5 0xb6e2ea48 in ?? () from /usr/lib/libpq.so.5
#6 0xb6e4adc9 in _pgsql_exec (database=0xb6e574a0 "asterisk", tablename=0xb2ebffbc "ast_config",
etc...
Process kernel stack for thread that is stuck:
[<c109777b>] __generic_file_aio_write+0x25e/0x282
[<c1043a90>] __dequeue_signal+0xf/0xce
[<c10447cd>] ptrace_stop+0x10c/0x1a0
[<c1045440>] get_signal_to_deliver+0x1e9/0x44d
[<c100b36b>] do_signal+0x2f/0x4c2
[<c10cccd5>] do_sync_write+0x0/0xdc
[<c10ccd7d>] do_sync_write+0xa8/0xdc
[<c10f30fb>] fsnotify+0x1d1/0x1e8
[<c121428b>] sys_send+0x19/0x1d
[<c1214a23>] sys_socketcall+0xf2/0x1cd
[<c100b988>] do_notify_resume+0x1e/0x65
[<c12c4850>] work_notifysig+0x13/0x1b
[<ffffffff>] 0xffffffff
> Race conditons and other problems in res_config_pgsql
> -----------------------------------------------------
>
> Key: ASTERISK-24287
> URL: https://issues.asterisk.org/jira/browse/ASTERISK-24287
> Project: Asterisk
> Issue Type: Bug
> Security Level: None
> Components: Resources/res_config_pgsql
> Affects Versions: 11.12.0
> Reporter: Etienne Lessard
>
> While looking to fix a deadlock in res_config_pgsql, I found that there was quite a few problems with the module. Here's what I found:
> In the find_table function, if database is NULL and the table is not found in the cache, than the lock on the psql_tables list is not released. An easy way to produce such a deadlock is to call the command "realtime show psql cache foobar" from the CLI; if your database doesn't have a table named "foobar", then the next thread which will try to acquire the psql_tables lock will deadlock.
> The find_table function sometimes calls the pgsql_exec function, but it doesn't lock the pgsql_lock mutex before, and the find_table function is not always called with the pgsql_lock mutex locked. This can cause a range of undefined behaviour if other threads are executing in res_config_pgsql, ranging from crash to deadlock. I've personally observed both.
> The command "realtime show pgsql status" use the pgsql connection without obtaining the lock first.
> The ESCAPE_STRING macro reference the pgsql connection, but once again, the lock is not systematically acquired before the macro is used, so this can lead to undefined behaviour.
> I also found other, less critical problems.
> I know the res_config_psql module is an "extended support" module. I would have liked to provide a patch to fix the numerous problems, but didn't have the time, and I instead made the switch to res_config_odbc. That said, I've still decided to open this bug, to let other people know that there's quite a few issues with the module.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
More information about the asterisk-bugs
mailing list