[asterisk-bugs] [JIRA] (ASTERISK-26310) Crash occurs every 24 - 48 hours with backtrace log showing fault related to pjsip hash

George Joseph (JIRA) noreply at issues.asterisk.org
Wed Sep 14 15:13:01 CDT 2016


    [ https://issues.asterisk.org/jira/browse/ASTERISK-26310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=232288#comment-232288 ] 

George Joseph commented on ASTERISK-26310:
------------------------------------------

You are entirely correct when you say "It almost seems like well you guys really don't know.".  We've seen failures there before but that code is very complex as you've seen, and hard for us to debug.  Since we didn't write that part, we only see it when there's an issue  That's why we asked the pjsip team for the ability to use an external resolver and used that new capability in Asterisk 14.  I hesitated recommending Asterisk 14 because it's still a release candidate but if it's working, great.  I'll leave this issue open for now and keep an eye on it.


> Crash occurs every 24 - 48 hours with backtrace log showing fault related to pjsip hash
> ---------------------------------------------------------------------------------------
>
>                 Key: ASTERISK-26310
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-26310
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: pjproject/pjsip
>    Affects Versions: 13.10.0, 13.11.0
>         Environment: Asterisk 13.10.0 running on fully updated Centos 7 linux 64bit. We also have a second backtrace showing the same ../src/pj/hash.c:181 in the (gdb) bt output from a second asterisk server running Asterisk 13.11.0-rc1 so we think we are crashing the same way across the 2 latest versions of asterisk 13.
>            Reporter: Gaston Mendez
>            Assignee: Unassigned
>            Severity: Critical
>         Attachments: asterisk_full_08-29-2016-0924a.txt, asterisk_full_08302016_0620p.txt, asterisk_full_08302016_0624p.txt, backtrace_08302016_0620p.txt, backtrace_08302016_0624p.txt, backtrace13-10-0-on-08-29-2016-0930a.txt, backtrace13-10-0.txt, backtrace13-11-0.txt, full_log_13-10-0.txt, full_log_13-11-0.txt, modules.conf.txt, pjsip.conf.txt, rtp.conf.txt, udptl.conf.txt
>
>
> We are trying to put an Asterisk 13 server into production. First time using pjsip as well. When we get to a loaded beta of 20 active calls we are experiencing crashes unpredictably and without a visible error or commonality between crashes. It is not load dependent because we have seen it crash at low points during the day with literally 1 - 2 active calls running during the crash. The only thing that's certain is that after steady load of every day use in 2 week beta we know it will crash every 48 hours, and more like every 24 hours. It will crash with no visible error or complaint in asterisk messages or full logs which are very clean and quiet logs. The coredump shows it citing line 181 of ../src/pj/hash.c and the only known commonality we have between crashes is that we have at least 2 backtraces on 2 different servers citing this same line of code in the back trace (gdb bt) like this:
> {noformat}
> #0  find_entry (lower=0, entry_buf=0x0, hval=0x7f52cc5412cc, val=0x0, keylen=258, key=0x7f52cc541310, ht=<optimized out>, pool=0x0) at ../src/pj/hash.c:181
> {noformat}
> {noformat}
> 181		if (entry->hash==hash && entry->keylen==keylen &&
> {noformat}
> It seems there is some instability we must be triggering in pjsip/asterisk. We are not doing anything outside the norm of what we've done on old versions of asterisk. Asterisk throws no message errors at any time, and other than this once a day crash, asterisk 13 is running very clean and high performing with no other complaint at all. We have reason to believe this is some asterisk/pjsip bug we have triggered. There are no exact steps to trigger it. It seems as long as there is at least 1 active call it can happen. It also happens about once every 24-48 hours for a span of 2 weeks. So the only way to 'reproduce' it is to wait 48 hours as we have been. We have multiple backtraces and are attaching 2 that show the same exact source code file and line number. As stated in the environment section we are crashing across 2 servers, the second being identical centos 7 fully yum updated 64 bit linux with the second server running Asterisk 13.11.0-rc1. We will attach everything we have from both servers and file it as a bug report and hope we can stabilize the system asap.



--
This message was sent by Atlassian JIRA
(v6.2#6252)



More information about the asterisk-bugs mailing list