[asterisk-bugs] [JIRA] (ASTERISK-27615) Dialplan deadlock when connection to external SQL server is lost
Richard Mudgett (JIRA)
noreply at issues.asterisk.org
Wed Feb 7 16:49:13 CST 2018
[ https://issues.asterisk.org/jira/browse/ASTERISK-27615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=242070#comment-242070 ]
Richard Mudgett commented on ASTERISK-27615:
--------------------------------------------
I see why calls to the CDR function can block. The CDR code has one thread to process CDR events from the system. CDR function requests to read or write to CDRs are handled by that one CDR processing thread. Once the first call completes, the CDR processing thread tries to write completed CDRs to the back ends and blocks trying to access the database. Subsequent CDR function requests are now blocked because the CDR processing thread is blocked.
You can avoid the CDR processing thread blocking by enabling batch mode. In batch mode, when CDRs are finalized they get queued to another thread to write them to the CDR back ends. There is a comment in cdr.c that not using batch mode is legacy behavior. See the CDR options documentation \[1] about configuring batch mode.
Now for the thread blocking in the database connector library. This looks more like a bug in the library. The {{connection_dead()}} routines in func_odbc.c and res_odbc.c check {{SQL_ATTR_CONNECTION_DEAD}} before trying a {{SELECT 1}}. To see if your {{SELECT 1}} idea would even help, as a test you can remove the {{SQL_ATTR_CONNECTION_DEAD}} check to do the {{SELECT 1}} check instead.
\[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+15+Configuration_cdr
> Dialplan deadlock when connection to external SQL server is lost
> ----------------------------------------------------------------
>
> Key: ASTERISK-27615
> URL: https://issues.asterisk.org/jira/browse/ASTERISK-27615
> Project: Asterisk
> Issue Type: Bug
> Security Level: None
> Components: CDR/cdr_adaptive_odbc, Functions/func_cdr
> Affects Versions: 14.6.0
> Reporter: Jared Hull
> Assignee: Unassigned
> Severity: Critical
> Attachments: core-asterisk-running-2018-01-26T17-11-24+0000-brief.txt, core-asterisk-running-2018-01-26T17-11-24+0000-full.txt, core-asterisk-running-2018-01-26T17-11-24+0000-locks.txt, core-asterisk-running-2018-01-26T17-11-24+0000-thread1.txt, core-asterisk-running-2018-01-26T20-21-53+0000-brief.txt, core-asterisk-running-2018-01-26T20-21-53+0000-full.txt, core-asterisk-running-2018-01-26T20-21-53+0000-locks.txt, core-asterisk-running-2018-01-26T20-21-53+0000-thread1.txt
>
>
> We have a cluster of SQL servers for CDR and realtime states that are used in dialplan. Recently we had a single SQL server lose network connectivity, and all Asterisk instances which used this server as their primary started to hang in dialplan.
> If I stop the SQL service while leaving the server pingable, Asterisk will continue to work and simply return a few errors when the CDR is committed to the database.
> {code}
> res_odbc.c:962 odbc_obj_connect: res_odbc: Error SQLConnect=-1 errno=2003 [unixODBC][MySQL][ODBC 5.2(w) Driver]Can't connect to MySQL server on 'dev-dallas-sql1
> cdr_adaptive_odbc.c:436 odbc_log: cdr_adaptive_odbc: Unable to retrieve database handle for 'dev-dallas-sql1:cdr_event_log'. CDR failed: INSERT INTO cdr_event_log (
> {code}
> If I 'service network stop' on the SQL server to simulate network failure, asterisk stops executing dialplan related to func_odbc and cdr_adaptive_odbc. It is as if it still thinks the SQL connection is there, and refuses to failover to another DSN in the case of func_odbc. cdr_adaptive_odbc doesn't even have failover connections (this would be a very useful feature) so I don't know what can be done about that, other than to skip CDR and throw an error.
> Dialplan to reproduce:
> {code}
> exten => 101,1,noop(${CDR(anything)})
> exten => 102,1,noop(${ODBC_blacklist_global(42)})
> {code}
> Example of cdr_adaptive_odbc.conf entry:
> {code}
> [default]
> connection=dev-dallas-sql1
> table=cdr_event_log
> {code}
> Example of func_odbc.conf entry:
> {code}
> [blacklist_global]
> dsn=dev-dallas-sql1,dev-dallas-sql2,dev-dallas-sql3
> readsql=SELECT COUNT(*) FROM blacklist_global WHERE cid_number='${SQL_ESC(${ARG1})}'
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)
More information about the asterisk-bugs
mailing list