[asterisk-bugs] [JIRA] (ASTERISK-27615) Dialplan deadlock when connection to external SQL server is lost

Jared Hull (JIRA) noreply at issues.asterisk.org
Wed Feb 7 18:42:13 CST 2018


    [ https://issues.asterisk.org/jira/browse/ASTERISK-27615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=242072#comment-242072 ] 

Jared Hull commented on ASTERISK-27615:
---------------------------------------

Batch mode does seem to help avoid the CDR problem, but certain bits of my dialplan still hang when I use ODBC_ functions. They must also be handled by a single thread, which is being blocked by the dead connection check.

I've patched out the {{SQL_ATTR_CONNECTION_DEAD}} checks in favour of {{SELECT 1}} but it still blocks the thread.
It is hanging in the same place.
{noformat}#0  0x00007fe65ac8e70d in read () from /lib64/libpthread.so.0
#1  0x00007fe5ebb4daf0 in vio_read () from /usr/lib64/mysql/libmysqlclient.so.18
#2  0x00007fe5ebb4db71 in vio_read_buff () from /usr/lib64/mysql/libmysqlclient.so.18
#3  0x00007fe5ebb31d0a in my_real_read(st_net*, unsigned long*) () from /usr/lib64/mysql/libmysqlclient.so.18
#4  0x00007fe5ebb32b7c in my_net_read () from /usr/lib64/mysql/libmysqlclient.so.18
#5  0x00007fe5ebb257ac in cli_safe_read () from /usr/lib64/mysql/libmysqlclient.so.18
#6  0x00007fe5ebb26ca6 in cli_read_query_result () from /usr/lib64/mysql/libmysqlclient.so.18
#7  0x00007fe5ebb27e56 in mysql_real_query () from /usr/lib64/mysql/libmysqlclient.so.18
#8  0x00007fe5f0434244 in do_query () from /usr/lib64/libmyodbc5.so
#9  0x00007fe5f043626a in my_SQLExecute () from /usr/lib64/libmyodbc5.so
#10 0x00007fe5f3ba6439 in SQLExecute () from /lib64/libodbc.so.2
#11 0x00007fe5f3dfb6d1 in connection_dead (connection=0x1e37c88, class=0x1e33218) at res_odbc.c:797
{noformat}

There must be something very wrong with my environment, I guess Asterisk is doing all it can.

> Dialplan deadlock when connection to external SQL server is lost
> ----------------------------------------------------------------
>
>                 Key: ASTERISK-27615
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-27615
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: CDR/cdr_adaptive_odbc, Functions/func_cdr
>    Affects Versions: 14.6.0
>            Reporter: Jared Hull
>            Assignee: Unassigned
>            Severity: Critical
>         Attachments: core-asterisk-running-2018-01-26T17-11-24+0000-brief.txt, core-asterisk-running-2018-01-26T17-11-24+0000-full.txt, core-asterisk-running-2018-01-26T17-11-24+0000-locks.txt, core-asterisk-running-2018-01-26T17-11-24+0000-thread1.txt, core-asterisk-running-2018-01-26T20-21-53+0000-brief.txt, core-asterisk-running-2018-01-26T20-21-53+0000-full.txt, core-asterisk-running-2018-01-26T20-21-53+0000-locks.txt, core-asterisk-running-2018-01-26T20-21-53+0000-thread1.txt
>
>
> We have a cluster of SQL servers for CDR and realtime states that are used in dialplan. Recently we had a single SQL server lose network connectivity, and all Asterisk instances which used this server as their primary started to hang in dialplan.
> If I stop the SQL service while leaving the server pingable, Asterisk will continue to work and simply return a few errors when the CDR is committed to the database.
> {code}
> res_odbc.c:962 odbc_obj_connect: res_odbc: Error SQLConnect=-1 errno=2003 [unixODBC][MySQL][ODBC 5.2(w) Driver]Can't connect to MySQL server on 'dev-dallas-sql1
> cdr_adaptive_odbc.c:436 odbc_log: cdr_adaptive_odbc: Unable to retrieve database handle for 'dev-dallas-sql1:cdr_event_log'.  CDR failed: INSERT INTO cdr_event_log (
> {code}
> If I 'service network stop' on the SQL server to simulate network failure, asterisk stops executing dialplan related to func_odbc and cdr_adaptive_odbc. It is as if it still thinks the SQL connection is there, and refuses to failover to another DSN in the case of func_odbc. cdr_adaptive_odbc doesn't even have failover connections (this would be a very useful feature) so I don't know what can be done about that, other than to skip CDR and throw an error.
> Dialplan to reproduce:
> {code}
> exten => 101,1,noop(${CDR(anything)})
> exten => 102,1,noop(${ODBC_blacklist_global(42)})
> {code}
> Example of cdr_adaptive_odbc.conf entry:
> {code}
> [default]
> connection=dev-dallas-sql1
> table=cdr_event_log
> {code}
> Example of func_odbc.conf entry:
> {code}
> [blacklist_global]
> dsn=dev-dallas-sql1,dev-dallas-sql2,dev-dallas-sql3
> readsql=SELECT COUNT(*) FROM blacklist_global WHERE cid_number='${SQL_ESC(${ARG1})}'
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list