[asterisk-bugs] [JIRA] (ASTERISK-27170) segfault in pj_sockaddr_in_set_str_addr

Fri Aug 11 06:12:08 CDT 2017

    [ https://issues.asterisk.org/jira/browse/ASTERISK-27170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=238054#comment-238054 ] 

nappsoft commented on ASTERISK-27170:
-------------------------------------

I'm still not able to debug with BETTER_BACKTRACES and such things as this is a production embedded system with limited amount of memory. But I was able (after 4 hours of testing...) to reproduce the issue on a clone of the system (still with "normal" backtraces). Have a look at the following part of the cli output, this happens immediately before the crash (does the dialog id have anything to do with the (identical) transaction id for the hangup that occured 14 seconds before?:

[Aug 11 12:25:44] DEBUG[13474][C-000001bd]: res_rtp_asterisk.c:4560 ast_rtcp_interpret: Got RTCP report of 64 bytes
[Aug 11 12:25:44] DEBUG[13474][C-000001bd]: res_rtp_asterisk.c:4560 ast_rtcp_interpret: Got RTCP report of 64 bytes
[Aug 11 12:25:44] DEBUG[19605]: res_pjsip/pjsip_distributor.c:492 distributor: Searching for serializer associated with dialog dlg0x3045be8 for Request msg UPDATE/cseq=13844 (rdata0x3309ea8)
[Aug 11 12:25:44] DEBUG[19605]: res_pjsip/pjsip_distributor.c:500 distributor: Found serializer pjsip/distributor-000000f8 associated with dialog dlg0x3045be8

=> 0x3045be8 is from a call that had been hung up quite a while ago:

[Aug 11 12:25:30] DEBUG[13407][C-000001b7]: channel.c:2681 ast_hangup: Channel 0x27e42d0 'PJSIP/5119-00000252' hanging up.  Refs: 2
[Aug 11 12:25:30] DEBUG[13407][C-000001b7]: chan_pjsip.c:1991 hangup_cause2sip: AST hangup cause 16 (no match found in PJSIP)
[LWP 13407 exited]
[Aug 11 12:25:30] DEBUG[6034]: res_pjsip_session.c:2517 handle_outgoing_request: Method is BYE
[Aug 11 12:25:30] DEBUG[6034]: res_pjsip_session.c:2422 __print_debug_details: Function session_inv_on_tsx_state_changed called on event TSX_STATE
[Aug 11 12:25:30] DEBUG[6034]: res_pjsip_session.c:2436 __print_debug_details: The state change pertains to the endpoint '5119(PJSIP/5119-00000252)'
[Aug 11 12:25:30] DEBUG[6034]: res_pjsip_session.c:2444 __print_debug_details: The inv session does NOT have an invite_tsx
[Aug 11 12:25:30] DEBUG[6034]: res_pjsip_session.c:2447 __print_debug_details: The UAC BYE transaction involved in this state change is 0x3045be8
[Aug 11 12:25:30] DEBUG[6034]: res_pjsip_session.c:2451 __print_debug_details: The current transaction state is Calling
[Aug 11 12:25:30] DEBUG[6034]: res_pjsip_session.c:2453 __print_debug_details: The transaction state change event is TX_MSG
[Aug 11 12:25:30] DEBUG[6034]: res_pjsip_session.c:2458 __print_debug_details: The current inv state is CONFIRMED
[Aug 11 12:25:30] DEBUG[6034]: channel.c:2233 ast_channel_destructor: Channel 0x27e42d0 'PJSIP/5119-00000252' destroying

have a look into the complete log (at least of the last 14 seconds before the crash) and the backtrace

> segfault in pj_sockaddr_in_set_str_addr
> ---------------------------------------
>
>                 Key: ASTERISK-27170
>                 URL: https://issues.asterisk.org/jira/browse/ASTERISK-27170
>             Project: Asterisk
>          Issue Type: Bug
>      Security Level: None
>          Components: PBX/General
>    Affects Versions: 13.16.0
>         Environment: 64bit linux musl 1.1.15
>            Reporter: nappsoft
>            Assignee: Unassigned
>         Attachments: trace.txt
>
>
> From time to time asterisk crashes in pj_sockaddr_i_set_str_add. The asterisk version we use is 13.16.0 with some stability patches that flew into 13.17.0 (we will update to 13.17.0 soon). But we already had the same crashes with unpatched 13.16.0 versions and with older versions as well.
> According to the sip traces the last thing that happened was a sip transfer. The messageflow was:
> REFER (Phone) -> 202 Accepted (PBX) -> NOTIFY Trying (PBX) -> NOTIFY OK (PBX) -> BYE (Phone) - > OK (PBX for the BYE message) -> OK (Phone for the NOTIFY Trying) -> OK (Phone for the NOTIFY OK)
> As these are embedded systems with limited resources it's always difficult to make crash dumps there or to run asterisk in gdb... I'll try to get some complete backtraces in the future, but maybe somebody has an idea based on the described scenario. => maybe there is a race condition when the Phone sends OK messages for the NOTIFY messages after that the phone has already sent a BYE for the same call?

--
This message was sent by Atlassian JIRA
(v6.2#6252)