[asterisk-bugs] [JIRA] (ASTERISK-26054) Asterisk crashes (core dump)

Wed May 25 12:34:56 CDT 2016

    [ https://issues.asterisk.org/jira/browse/ASTERISK-26054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=230767#comment-230767 ] 

Etienne Lessard commented on ASTERISK-26054:
--------------------------------------------

Hello,

we started having a similar issue (i.e. having random asterisk process termination caused by an ABRT signal raised by libc after detecting memory corruption) starting with Asterisk 13.8.0 (it was working fine with Asterisk 13.7.2). I'm currently trying to isolate the problem, but haven't been able to precisely pinpoint it since I'm having some trouble reproducing it in a systematic way.

That said, the problem seems to be from one of the "odbc components", i.e. most likely in either res_odbc, res_config_odbc, cel_odbc, or in unixodbc or the ODBC driver (I'm using psqlodbc). I say that because:

* on our "load test" system, we are using res_odbc both to store CEL (via cel_odbc) and the queue_log (via res_config_odbc). This is a 32 bits Debian 8 system.
* after upgrading to asterisk 13.8, the asterisk process started crashing once or twice a day: same thing with asterisk 13.9
* after disabling all the odbc related stuff in asterisk, it stopped crashing
* I've been able to make asterisk crash with a simple module that calls "ast_store_realtime" repeatedly from multiple threads (it took 1 200 000 tries before crashing with a memory corruption error the first time)
* I've tried to reproduce it on another system but I've not been able yet
* I'm currently trying to run it under valgrind, but I've not seen anything interesting yet
* If you look at "B. Davis" backtrace, you'll see that he's also using odbc in asterisk:

{code}
Thread 303 (Thread 0x7fc95643e700 (LWP 6759)):
#0  0x00007fc95869857d in write () from /lib64/libc.so.6
#1  0x00007fc95862ead3 in _IO_new_file_write () from /lib64/libc.so.6
#2  0x00007fc958630085 in _IO_new_do_write () from /lib64/libc.so.6
#3  0x00007fc958630df3 in _IO_flush_all_lockp () from /lib64/libc.so.6
#4  0x00007fc9585f0eb9 in abort () from /lib64/libc.so.6
#5  0x00007fc95862d537 in __libc_message () from /lib64/libc.so.6
#6  0x00007fc958632f4e in malloc_printerr () from /lib64/libc.so.6
#7  0x00007fc958635cf0 in _int_free () from /lib64/libc.so.6
#8  0x00007fc8e58c41fe in my_SQLFreeEnv () from /usr/lib64/libmyodbc5.so
#9  0x00007fc954c3fc38 in ?? () from /usr/lib64/libodbc.so.2
#10 0x00007fc954c40829 in ?? () from /usr/lib64/libodbc.so.2
#11 0x00007fc954c4527a in SQLDisconnect () from /usr/lib64/libodbc.so.2
#12 0x00007fc954e9e64c in ?? () from /usr/lib64/asterisk/modules/res_odbc.so
#13 0x00007fc954e9c5ce in ?? () from /usr/lib64/asterisk/modules/res_odbc.so
#14 0x000000000045cc3a in ?? ()
#15 0x000000000045cf1d in __ao2_ref ()
#16 0x00007fc954e9e40a in ast_odbc_release_obj () from /usr/lib64/asterisk/modules/res_odbc.so
#17 0x00007fc8c0770b03 in ?? () from /usr/lib64/asterisk/modules/cel_odbc.so
#18 0x00000000004a82c4 in ?? ()
#19 0x000000000045de00 in ?? ()
#20 0x000000000045e133 in __ao2_callback ()
#21 0x00000000004a844c in ?? ()
#22 0x00000000004a99fa in ?? ()
#23 0x00000000004a9c8f in ?? ()
#24 0x00000000005dae29 in ?? ()
#25 0x00000000005c97f1 in ?? ()
#26 0x00000000005ca38d in ?? ()
#27 0x00000000005e72f6 in ast_taskprocessor_execute ()
#28 0x00000000005e58ab in ?? ()
#29 0x00000000005fb85d in ?? ()
#30 0x00007fc95931daa1 in start_thread () from /lib64/libpthread.so.0
#31 0x00007fc9586a593d in clone () from /lib64/libc.so.6
{code}

Note that I'm not running the latest version of unixodbc (using 2.3.1, latest is 2.3.4), nor the latest version of psqlodbc (using 09.03.0300, latest is 09.05.0210). I do plan on trying these out. I've not enabled connection pooling in unixodbc neither yet.

> Asterisk crashes (core dump)
> ----------------------------
>
>                 Key: ASTERISK-26054
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-26054
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: CDR/cdr_custom
>    Affects Versions: 13.9.1
>         Environment: cat /etc/*release*
> SHMZ release 6.6 (Final)
> SHMZ release 6.6 (Final)
> SHMZ release 6.6 (Final)
> SHMZ release 6.6 (Final)
> cpe:/o:schmooze:linux:6:GA
> lscpu
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                24
> On-line CPU(s) list:   0-23
> Thread(s) per core:    2
> Core(s) per socket:    6
> Socket(s):             2
> NUMA node(s):          2
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 63
> Stepping:              2
> CPU MHz:               2399.943
> BogoMIPS:              4799.33
> Virtualization:        VT-x
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              256K
> L3 cache:              15360K
> NUMA node0 CPU(s):     0-5,12-17
> NUMA node1 CPU(s):     6-11,18-23
>  vmstat
> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
>  1  0      0 224716 241680 30706788    0    0     0     3    1    1  0  0 100  0  0
>            Reporter: B. Davis
>            Assignee: B. Davis
>            Severity: Critical
>         Attachments: backtrace.txt
>
>
> New installation, CDR records stored on external database over a dedicated network interface, system appears to run fine and then randomly once every day or two has a core dump. 
> System operates with about 100-180 active calls w/ about 250-300 channels open.
> System has 32GB RAM and ram shows as mostly cashed.
> Attached is the backtrace.

--
This message was sent by Atlassian JIRA
(v6.2#6252)