[asterisk-users] How can I check backtrace files ?

George Joseph gjoseph at digium.com
Thu Dec 7 08:50:04 CST 2017


On Wed, Dec 6, 2017 at 11:13 AM, Olivier <oza.4h07 at gmail.com> wrote:

>
>
> 2017-12-06 15:52 GMT+01:00 George Joseph <gjoseph at digium.com>:
>
>>
>>
>> On Tue, Dec 5, 2017 at 9:20 AM, Olivier <oza.4h07 at gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I carefully read [1] which details how backtrace files can be produced.
>>>
>>> Maybe this seems natural to some, but how can I go one step futher, and
>>> check that produced XXX-thread1.txt, XXX-brief.txt, ... files are OK ?
>>>
>>> In other words, where can I find an example on how to use one of those
>>> files and check by myself, that if a system ever fails, I won't have to
>>> wait for another failure to provide required data to support teams ?
>>>
>>
>> It's a great question but I could spend a week answering it and not
>> scratch the surface. :)
>>
>
> Thanks very much for trying, anyway ;-)
>
>
>>  It's not a straightforward thing unless you know the code in question.
>> The most common is a segmentation fault (segfault or SEGV).
>>
>
> True ! I experienced segfaults lately and I could not configure the
> platform I used then (Debian Jessie) to produce core files in a directory
> Asterisk can write into.
> Now, with Debian Stretch, I can produce core file at will (with a kill -s
> SIGSEGV <processid>).
> I checked ast_coredumped worked OK as it produced thread.txt files and so
> on.
>
> Ideally, I would like to go one step further: check now that a future .txt
> file would be "workable" (and not "you should have compiled with option XXX
> or configured with option YYY) .
>
>
>
>>   In that case, the thread1.txt file is the place to start.  Since most
>> of the objects passed around are really pointers to objects, the most
>> obvious cause would be a 0x0 for a value.  So for instance "chan=0x0".
>> That would be a pointer to a channel object that was not set when it
>> probably should have been.  Unfortunately, it's not only 0x0 that could
>> cause a segv.   Anytime a program tries to access memory it doesn't own,
>> that signal is raised.  So let's say there a 256 byte buffer which the
>> process owns.  If there's a bug somewhere that causes the program to try
>> and access bytes beyond the end of the buffer, you MAY get a segv if that
>> process doesn't also own that memory.  If this case, the backtrace won't
>> show anything obvious because the pointers all look valid.  There probably
>> would be an index variable (i or ix, etc) that may be set to 257 but you'd
>> have to know that the buffer was only 256 bytes to realize that that was
>> the issue.
>>
>
> So, with an artificial kill -s SIGSEGV <processid>, does the bellow
> output prove I have a workable .txt files (having .txt files that let
> people find the root cause of the issue is another story as we probably can
> only hope for the best here) ?
>
>
> # head core-brief.txt
> !@!@!@! brief.txt !@!@!@!
>
>
> Thread 38 (Thread 0x7f2aa5dd0700 (LWP 992)):
> #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/
> x86_64/pthread_cond_timedwait.S:225
> #1  0x000055cdcb69ae84 in __ast_cond_timedwait (filename=0x55cdcb7d4910
> "threadpool.c", lineno=1131, func=0x55cdcb7d4ea8 <__PRETTY_FUNCTION__.8978>
> "worker_idle", cond_name=0x55cdcb7d4b7f "&worker->cond",
> mutex_name=0x55cdcb7d4b71 "&worker->lock", cond=0x7f2abc000978,
> t=0x7f2abc0009a8, abstime=0x7f2aa5dcfc30) at lock.c:668
> #2  0x000055cdcb75d153 in worker_idle (worker=0x7f2abc000970) at
> threadpool.c:1131
> #3  0x000055cdcb75ce61 in worker_start (arg=0x7f2abc000970) at
> threadpool.c:1022
> #4  0x000055cdcb769a8c in dummy_start (data=0x7f2abc000a80) at utils.c:1238
> #5  0x00007f2aeddad494 in start_thread (arg=0x7f2aa5dd0700) at
> pthread_create.c:333
>


That's it!  The key pieces of information are the function names
(worker_idle, worker_start, etc.), the filename (threadpool.c, etc) and the
line numbers (1131, 1022, etc).




>
>
>> Deadlocks are even harder to troubleshoot.  For that, you need to look at
>> full.txt to see where the threads are stuck and find the 1 thread that's
>> holding the lock that the others are stuck on.
>>
>> Sorry.  I wish I had a better answer because it'd help a lot if folks
>> could do more investigation themselves.
>>
>>
>>
>>
>>
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.digium.com/pipermail/asterisk-users/attachments/20171207/03bb40b7/attachment.html>


More information about the asterisk-users mailing list