[asterisk-bugs] [JIRA] (ASTERISK-27170) segfault in pj_sockaddr_in_set_str_addr

Fri Aug 18 05:56:09 CDT 2017

    [ https://issues.asterisk.org/jira/browse/ASTERISK-27170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=238132#comment-238132 ] 

nappsoft commented on ASTERISK-27170:
-------------------------------------

FYI: 
the startup bug was caused by musl, the following change in git-head fixed the issue: http://git.musl-libc.org/cgit/musl/commit/?id=27b3fd68f67b674440d21ea7ca5cf918d2e1559f
However the following change in git-head breaks startup completely as long as one doesn't blacklist some modules: http://git.musl-libc.org/cgit/musl/commit/?id=6476b8135760659b25c93ff9308425ca98a9e777

I'll check now with musl git head with dynlink.c reverted back to the latest version before 6476b8135760659b25c93ff9308425ca98a9e777 to see whether the issues persist.

About bisecting asterisk: this won't help I guess as we have these kind of crashes since many versions (since switching to chan_pjsip), but usually only in special situations, like network issues, sip switch issues or similar. That's why I never had a deeper look into it. However one single (new) customer has frequent crashes. Phones and Softphones in use are the same as at other places, the configuration is almost identical (we have some sort of database that is "provisioning" the config files on startup and most of the logic is implemented in agi-scripts doing actions based on the database), the same software versions (by booting the virtual machines get a binary firmware blob over tftp), that's why I focussed on features this customer is using others don't use... and this seems to be pickupChan, which was as well involved in the 6th observed crash from which we have pcap files.

However: who knows, maybe some changes in musl git head do their magic (especially some "fix uninitialized value on failure in XY" changes). Will let you know after further tests.

> segfault in pj_sockaddr_in_set_str_addr
> ---------------------------------------
>
>                 Key: ASTERISK-27170
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-27170
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: PBX/General
>    Affects Versions: 13.16.0
>         Environment: 64bit linux musl 1.1.15
>            Reporter: nappsoft
>            Assignee: Unassigned
>         Attachments: crashlog.txt, trace_cel_crash.txt, trace.txt, valgrind2.txt
>
>
> From time to time asterisk crashes in pj_sockaddr_i_set_str_add. The asterisk version we use is 13.16.0 with some stability patches that flew into 13.17.0 (we will update to 13.17.0 soon). But we already had the same crashes with unpatched 13.16.0 versions and with older versions as well.
> According to the sip traces the last thing that happened was a sip transfer. The messageflow was:
> REFER (Phone) -> 202 Accepted (PBX) -> NOTIFY Trying (PBX) -> NOTIFY OK (PBX) -> BYE (Phone) - > OK (PBX for the BYE message) -> OK (Phone for the NOTIFY Trying) -> OK (Phone for the NOTIFY OK)
> As these are embedded systems with limited resources it's always difficult to make crash dumps there or to run asterisk in gdb... I'll try to get some complete backtraces in the future, but maybe somebody has an idea based on the described scenario. => maybe there is a race condition when the Phone sends OK messages for the NOTIFY messages after that the phone has already sent a BYE for the same call?

--
This message was sent by Atlassian JIRA
(v6.2#6252)