[asterisk-users] How can I check backtrace files ?

Wed Dec 6 13:18:40 CST 2017

On Wed, Dec 6, 2017 at 12:13 PM, Olivier <oza.4h07 at gmail.com> wrote:

>
>
> 2017-12-06 15:52 GMT+01:00 George Joseph <gjoseph at digium.com>:
>
>>
>>
>> On Tue, Dec 5, 2017 at 9:20 AM, Olivier <oza.4h07 at gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I carefully read [1] which details how backtrace files can be produced.
>>>
>>> Maybe this seems natural to some, but how can I go one step futher, and
>>> check that produced XXX-thread1.txt, XXX-brief.txt, ... files are OK ?
>>>
>>> In other words, where can I find an example on how to use one of those
>>> files and check by myself, that if a system ever fails, I won't have to
>>> wait for another failure to provide required data to support teams ?
>>>
>>
>> It's a great question but I could spend a week answering it and not
>> scratch the surface. :)
>>
>
> Thanks very much for trying, anyway ;-)
>
>
>>  It's not a straightforward thing unless you know the code in question.
>> The most common is a segmentation fault (segfault or SEGV).
>>
>
> True ! I experienced segfaults lately and I could not configure the
> platform I used then (Debian Jessie) to produce core files in a directory
> Asterisk can write into.
> Now, with Debian Stretch, I can produce core file at will (with a kill -s
> SIGSEGV <processid>).
> I checked ast_coredumped worked OK as it produced thread.txt files and so
> on.
>
> Ideally, I would like to go one step further: check now that a future .txt
> file would be "workable" (and not "you should have compiled with option XXX
> or configured with option YYY) .
>
>
>
>>   In that case, the thread1.txt file is the place to start.  Since most
>> of the objects passed around are really pointers to objects, the most
>> obvious cause would be a 0x0 for a value.  So for instance "chan=0x0".
>> That would be a pointer to a channel object that was not set when it
>> probably should have been.  Unfortunately, it's not only 0x0 that could
>> cause a segv.   Anytime a program tries to access memory it doesn't own,
>> that signal is raised.  So let's say there a 256 byte buffer which the
>> process owns.  If there's a bug somewhere that causes the program to try
>> and access bytes beyond the end of the buffer, you MAY get a segv if that
>> process doesn't also own that memory.  If this case, the backtrace won't
>> show anything obvious because the pointers all look valid.  There probably
>> would be an index variable (i or ix, etc) that may be set to 257 but you'd
>> have to know that the buffer was only 256 bytes to realize that that was
>> the issue.
>>
>
> So, with an artificial kill -s SIGSEGV <processid>, does the bellow
> output prove I have a workable .txt files (having .txt files that let
> people find the root cause of the issue is another story as we probably can
> only hope for the best here) ?
>
>
> # head core-brief.txt
> !@!@!@! brief.txt !@!@!@!
>
>
> Thread 38 (Thread 0x7f2aa5dd0700 (LWP 992)):
> #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/
> x86_64/pthread_cond_timedwait.S:225
> #1  0x000055cdcb69ae84 in __ast_cond_timedwait (filename=0x55cdcb7d4910
> "threadpool.c", lineno=1131, func=0x55cdcb7d4ea8 <__PRETTY_FUNCTION__.8978>
> "worker_idle", cond_name=0x55cdcb7d4b7f "&worker->cond",
> mutex_name=0x55cdcb7d4b71 "&worker->lock", cond=0x7f2abc000978,
> t=0x7f2abc0009a8, abstime=0x7f2aa5dcfc30) at lock.c:668
> #2  0x000055cdcb75d153 in worker_idle (worker=0x7f2abc000970) at
> threadpool.c:1131
> #3  0x000055cdcb75ce61 in worker_start (arg=0x7f2abc000970) at
> threadpool.c:1022
> #4  0x000055cdcb769a8c in dummy_start (data=0x7f2abc000a80) at utils.c:1238
> #5  0x00007f2aeddad494 in start_thread (arg=0x7f2aa5dd0700) at
> pthread_create.c:333
>

The number one question when you supply a backtrace: Does it have symbols?

So yes, the sample above is at least workable.  It has symbols as it shows
the function
name, source file name, and line number in the backtrace.  Without symbols
nobody can
look at the backtrace and see what is going on.  It is just a bunch of
numbers and question
marks (??) with maybe a public function name.

The second question:  Is the backtrace from an unoptimized build?

Optimized builds provide some performance improvement for normal
operation.  However,
what the compiler does to the code can be difficult to figure out in a
backtrace.  The compiler
can optimize out variables that could make understanding what is going on
harder.

So it depends upon what happened if an optimized backtrace can help find
the root cause
or not.  It is up to you whether you want to run in production with an
optimized build or not.

I also recommend always compiling with BETTER_BACKTRACES enabled in
menuselect.
With that enabled then any backtraces put into log files by FRACKS and the
lock output
from the CLI command "core show locks" is understandable when symbols are
available.
You get backtraces similar to the backtrace sample above.

Richard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-users/attachments/20171206/742413d5/attachment.html>